Novel glyphosate-n-acetyltransferase (gat) genes

ABSTRACT

Novel proteins are provided herein, including proteins capable of catalyzing the acetylation of glyphosate and other structurally related proteins. Also provided are novel polynucleotides capable of encoding these proteins, compositions that include one or more of these novel proteins and/or polynucleotides, recombinant cells and transgenic plants comprising these novel compounds, diversification methods involving the novel compounds, and methods of using the compounds. Some of the novel methods and compounds provided herein can be used to render an organism, such as a plant, resistant to glyphosate.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional of co-pending U.S. application Ser. No.10/835,615, filed Apr. 29, 2004, which is hereby incorporated in itsentirety by reference herein.

COPYRIGHT NOTIFICATION PURSUANT TO 37 C.F.R. § 1.71(E)

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure, as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

REFERENCE TO A SEQUENCE LISTING SUBMITTED AS A TEXT FILE VIA EFS-WEB

The official copy of the sequence listing is submitted electronicallyvia EFS-Web as an ASCII formatted sequence listing with a file name of035718-325185SEQLIST.txt, created on Aug. 23, 2007, and having a size of1.19 MB and is filed concurrently with the specification. The sequencelisting contained in this ASCII formatted document is part of thespecification and is herein incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

Crop selectivity to specific herbicides can be conferred by engineeringgenes into crops which encode appropriate herbicide metabolizingenzymes. In some cases these enzymes, and the nucleic acids that encodethem, originate in a plant. In other cases, they are derived from otherorganisms, such as microbes. See, e.g., Padgette et al. (1996) “New weedcontrol opportunities: Development of soybeans with a Round UP Ready™gene” and Vasil (1996) “Phosphinothricin-resistant crops”, both inHerbicide-Resistant Crops, ed. Duke (CRC Press, Boca Raton, Fla.) pp.54-84 and pp. 85-91. Indeed, transgenic plants have been engineered toexpress a variety of herbicide tolerance/metabolizing genes, from avariety of organisms. For example, acetohydroxy acid synthase, which hasbeen found to make plants that express this enzyme resistant to multipletypes of herbicides, has been introduced into a variety of plants (see,e.g., Hattori et al. (1995) Mol. Gen. Genet. 246: 419). Other genes thatconfer tolerance to herbicides include: a gene encoding a chimericprotein of rat cytochrome P4507A1 and yeast NADPH-cytochrome P450oxidoreductase (Shiota et al. (1994) Plant Physiol. 106: 17), genes forglutathione reductase and superoxide dismutase (Aono et al. (1995) PlantCell Physiol. 36: 1687, and genes for various phosphotransferases (Dattaet al. (1992) Plant Mol. Biol. 20: 619).

One herbicide which is the subject of much investigation in this regardis N-phosphonomethylglycine, commonly referred to as glyphosate.Glyphosate is the top selling herbicide in the world, with salesprojected to reach $5 billion by 2003. It is a broad spectrum herbicidethat kills both broadleaf and grass-type plants. A successful mode ofcommercial level glyphosate resistance in transgenic plants is byintroduction of a modified Agrobacterium CP45-enolpyruvylshikimate-3-phosphate synthase (hereinafter referred to asEPSP synthase or EPSPS) gene. The transgene is targeted to thechloroplast where it is capable of continuing to synthesize EPSP fromphosphoenolpyruvic acid (PEP) and shikimate-3-phosphate in the presenceof glyphosate. In contrast, the native EPSP synthase is inhibited byglyphosate. Without the transgene, plants sprayed with glyphosatequickly die due to inhibition of EPSP synthase which halts thedownstream pathway needed for aromatic amino acid, hormone, and vitaminbiosynthesis. The CP4 glyphosate-resistant soybean transgenic plants aremarketed, e.g., by Monsanto under the name “Round UP Ready™.”

In the environment, the predominant mechanism by which glyphosate isdegraded is through soil microflora metabolism. The primary metaboliteof glyphosate in soil has been identified as aminomethylphosphonic acid(AMPA), which is ultimately converted into ammonia, phosphate and carbondioxide. The proposed metabolic scheme that describes the degradation ofglyphosate in soil through the AMPA pathway is shown in FIG. 8. Analternative metabolic pathway for the breakdown of glyphosate by certainsoil bacteria, the sarcosine pathway, occurs via initial cleavage of theC—P bond to give inorganic phosphate and sarcosine, as depicted in FIG.9.

Another successful herbicide/transgenic crop package is glufosinate(phosphinothricin) and the Liberty Link™ trait marketed, e.g., byAventis. Glufosinate is also a broad spectrum herbicide. Its target isthe glutamate synthase enzyme of the chloroplast. Resistant plants carrythe bar gene from Streptomyces hygroscopicus and achieve resistance bythe N-acetylation activity of bar, which modifies and detoxifiesglufosinate.

An enzyme capable of acetylating the primary amine of AMPA is reportedin PCT Application No. WO 00/29596. The enzyme was not described asbeing able to acetylate a compound with a secondary amine (e.g.,glyphosate).

While a variety of herbicide resistance strategies are available asnoted above, additional approaches would have considerable commercialvalue. The present invention provides novel polynucleotides andpolypeptides for conferring herbicide tolerance, as well as numerousother benefits as will become apparent during review of the disclosure.

SUMMARY OF THE INVENTION

The present invention provides methods and reagents for rendering anorganism, such as a plant, resistant to glyphosate by one or more of theembodiments described below.

One embodiment of the invention provides novel polypeptides referred toherein as glyphosate-N-acetyltransferase (“GAT”) polypeptides. GATpolypeptides are characterized by their structural similarity to oneanother, e.g., in terms of sequence similarity when the GAT polypeptidesare aligned with one another. GAT polypeptides of the present inventionpossess glyphosate-N-acetyltransferase activity, i.e., the ability tocatalyze the acetylation of glyphosate. These GAT polypeptides transferthe acetyl group from acetyl CoA to the N of glyphosate. In addition,some GAT polypeptides transfer the propionyl group of propionyl CoA tothe N of glyphosate. Some GAT polypeptides are also capable ofcatalyzing the acetylation of glyphosate analogs and/or glyphosatemetabolites, e.g., aminomethylphosphonic acid. Exemplary GATpolypeptides correspond to SEQ ID NO: 568, 569, 570, 571, 572, 573, 574,575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588,589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602,603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616,617, 618, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641,643, 645, 647, 649, 651, 653, 655, 657, 659, 661, 663, 665, 667, 669,671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 695, 697,699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725,727, 729, 731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753,755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 775, 777, 779, 781,783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809,811, 813, 815, 817, 819, 821, 823, 825, 833, 835, 837, 839, 841, 843,845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871,873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893, 895, 897, 899,901, 903, 905, 907, 909, 911, 913, 915, 917, 919, 921, 923, 925, 927,929, 931, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964,965, 966, 967, 968, 969, 970, 971, and 972.

Also provided are novel polynucleotides referred to herein as GATpolynucleotides, e.g., SEQ ID NO: 516, 517, 518, 519, 520, 521, 522,523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536,537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550,551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564,565, 566, 567, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640,642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 664, 666, 668,670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696,698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724,726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748, 750, 752,754, 756, 758, 760, 762, 764, 768, 770, 772, 774, 776, 778, 780, 782,784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810,812, 814, 816, 818, 820, 822, 824, 832, 834, 836, 838, 840, 842, 844,846, 848, 850, 852, 854, 856, 858, 860, 862, 864, 866, 868, 870, 872,874, 876, 878, 880, 882, 884, 886, 888, 890, 892, 894, 896, 898, 900,902, 904, 906, 908, 910, 912, 914, 916, 918, 920, 922, 924, 926, 928,930, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944,945, 947, 949, 951, and 952. GAT polynucleotides are characterized bytheir ability to encode GAT polypeptides. In some embodiments of theinvention, a GAT polynucleotide is engineered for better plantexpression by replacing one or more parental codons with a synonymouscodon that is preferentially used in plants relative to the parentalcodon. In other embodiments, a GAT polynucleotide is modified by theintroduction of a nucleotide sequence encoding an N-terminal chloroplasttransit peptide. In other embodiments, a GAT polynucleotide is modifiedby the insertion of one or more G+C containing codons (such as GCG orGCT) immediately downstream of and adjacent to the initiating Met codon.

GAT polypeptides, GAT polynucleotides and glyphosate-N-acetyltransferaseactivity are described in more detail below. The invention furtherincludes certain fragments of the GAT polypeptides and GATpolynucleotides described herein.

The invention includes non-native variants of the polypeptides andpolynucleotides described herein, wherein one or more amino acid of theencoded polypeptide has been mutated.

In certain preferred embodiments, the GAT polypeptides of the presentinvention are characterized as follows. When optimally aligned with areference amino acid sequence selected from the group consisting of SEQID NO: 300, 445, and 457 to generate a similarity score of at least 460using the BLOSUM62 matrix, a gap existence penalty of 11, and a gapextension penalty of 1, one or more of the following positions conformto the following restrictions: (i) at positions 18 and 38, a Z5 aminoacid residue; (ii) at position 62, a Z1 amino acid residue; (iii) atposition 124, a Z6 amino acid residue; and (iv) at position 144, a Z2amino acid residue, wherein: Z1 is an amino acid residue selected fromthe group consisting of A, I, L, M, and V; Z2 is an amino acid residueselected from the group consisting of F, W, and Y; Z5 is an amino acidresidue selected from the group consisting of D and E; and Z6 is anamino acid residue selected from the group consisting of C, G, and P.

The invention further provides an isolated or recombinant polypeptidecomprising an amino acid sequence selected from the groups consistingof: (a) an amino acid sequence that is at least 98% identical to SEQ IDNO:577; (b) an amino acid sequence that is at least 97% identical to SEQID NO:578; (c) an amino acid sequence that is at least 97% identical toSEQ ID NO:621; (d) an amino acid sequence that is at least 98% identicalto SEQ ID NO:579; (e) an amino acid sequence that is at least 98%identical to SEQ ID NO:602; (f) an amino acid sequence that is at least95% identical to SEQ ID NO:697; (g) an amino acid sequence that is atleast 96% identical to SEQ ID NO:712; (h) an amino acid sequence that isat least 97% identical to SEQ ID NO:613; (i) an amino acid sequence thatis at least 89% identical to SEQ ID NO:677; (j) an amino acid sequencethat is at least 96% identical to SEQ ID NO:584; (k) an amino acidsequence that is at least 98% identical to SEQ ID NO:707; (l) an aminoacid sequence that is at least 98% identical to SEQ ID NO:616; (m) anamino acid sequence that is at least 96% identical to SEQ ID NO:612; and(n) an amino acid sequence that is at least 98% identical to SEQ IDNO:590.

The invention further provides an isolated or recombinant polypeptidecomprising an amino acid sequence selected from the groups consistingof: (a) an amino acid sequence that is at least 96% identical topositions 2-146 of SEQ ID NO:919 (such as, for example, SEQ ID NO:917,919, 921, 923, 925, 927, 833, 835, 839, 843, 845, 859, 863, 873, 877,891, 895, 901, 905, 907, 913, 915, or 950); (b) an amino acid sequencethat is at least 97% identical to positions 2-146 of SEQ ID NO:929 (suchas, for example, SEQ ID NO:929, 931, 835, 843, 849, or 867); (c) anamino acid sequence that is at least 98% identical to positions 2-146 ofSEQ ID NO:847 (such as, for example, SEQ ID NO:845 or 847); (d) an aminoacid sequence that is at least 98% identical to positions 2-146 of SEQID NO:851; (e) an amino acid sequence that is at least 98% identical topositions 2-146 of SEQ ID NO:853; (f) an amino acid sequence that is atleast 98% identical to positions 2-146 of SEQ ID NO:855 (such as, forexample, SEQ ID NO:835 or 855); (g) an amino acid sequence that is atleast 98% identical to positions 2-146 of SEQ ID NO:857; (h) an aminoacid sequence that is at least 98% identical to positions 2-146 of SEQID NO:861 (such as, for example, SEQ ID NO:839, 861, or 883); (i) anamino acid sequence that is at least 98% identical to positions 2-146 ofSEQ ID NO:871; (j) an amino acid sequence that is at least 98% identicalto positions 2-146 of SEQ ID NO:875; (k) an amino acid sequence that isat least 98% identical to positions 2-146 of SEQ ID NO:881; (l) an aminoacid sequence that is at least 98% identical to positions 2-146 of SEQID NO:885 (such as, for example, SEQ ID NO:845 or 885); (m) an aminoacid sequence that is at least 98% identical to positions 2-146 of SEQID NO:887; (n) an amino acid sequence that is at least 98% identical topositions 2-146 of SEQ ID NO:889 (such as, for example, SEQ ID NO: 863,889, 891, or 903); (o) an amino acid sequence that is at least 98%identical to positions 2-146 of SEQ ID NO:893; (p) an amino acidsequence that is at least 98% identical to positions 2-146 of SEQ IDNO:897; (q) an amino acid sequence that is at least 98% identical topositions 2-146 of SEQ ID NO:899; (r) an amino acid sequence that is atleast 98% identical to positions 2-146 of SEQ ID NO:909 (such as, forexample, SEQ ID NO:883 or 909); (s) an amino acid sequence that is atleast 98% identical to positions 2-146 of SEQ ID NO:911; (t) an aminoacid sequence that is at least 99% identical to positions 2-146 of SEQID NO:837; (u) an amino acid sequence that is at least 99% identical topositions 2-146 of SEQ ID NO:841; (v) an amino acid sequence that is atleast 99% identical to positions 2-146 of SEQ ID NO:865; (w) an aminoacid sequence that is at least 99% identical to positions 2-146 of SEQID NO:869; and (x) an amino acid sequence that is at least 99% identicalto positions 2-146 of SEQ ID NO:879. In some embodiments of theinvention, the amino acid sequence of the polypeptide comprises Met,Met-Ala, or Met-Ala-Ala on the N-terminal side of the amino acidcorresponding to position 2 of the reference amino acid sequence.

The invention further provides an isolated or recombinant polypeptidecomprising an amino acid sequence that is at least 95% identical topositions 2-146 of SEQ ID NO:929 and which comprises a Gly or an Asnresidue at the amino acid position corresponding to position 33 of SEQID NO:929 (such as, for example, SEQ ID NO:837, 849, 893, 897, 905, 921,927, 929 or 931). In some embodiments of the invention, the amino acidsequence of the polypeptide comprises Met, Met-Ala, or Met-Ala-Ala onthe N-terminal side of the amino acid corresponding to position 2 of thereference amino acid sequence.

The invention further provides a nucleic acid construct comprising apolynucleotide of the invention. The construct can be a vector, such asa plant transformation vector. In some aspects a vector of the inventionwill comprise a T-DNA sequence. The construct can optionally include aregulatory sequence (e.g., a promoter) operably linked to a GATpolynucleotide, where the promoter is heterologous with respect to thepolynucleotide and effective to cause sufficient expression of theencoded polypeptide to enhance the glyphosate tolerance of a plant celltransformed with the nucleic acid construct.

In some aspects of the invention, a GAT polynucleotide functions as aselectable marker, e.g., in a plant, bacteria, actinomycete, yeast,algae or other fungi. For example, an organism that has been transformedwith a vector including a GAT polynucleotide selectable marker can beselected based on its ability to grow in the presence of glyphosate. AGAT marker gene can be used for selection or screening for transformedcells expressing the gene.

The invention further provides vectors with stacked traits, i.e.,vectors that encode a GAT polypeptide and that also include a secondpolynucleotide sequence encoding a second polypeptide that confers adetectable phenotypic trait upon a cell or organism expressing thesecond polypeptide at an effective level, for example disease resistanceor pest resistance. The detectable phenotypic trait can also function asa selectable marker, e.g., by conferring herbicide resistance or byproviding some sort of visible marker.

In one embodiment, the invention provides a composition comprising twoor more polynucleotides of the invention. Preferably, the GATpolynucleotides encode GAT polypeptides having different kineticparameters, i.e., a GAT variant having a lower K_(m) can be combinedwith one having a higher k_(cat). In a further embodiment, the differentGAT polynucleotides may be coupled to a chloroplast transit sequence orother signal sequence thereby providing GAT polypeptide expression indifferent cellular compartments, organelles or secretion of one or moreof the GAT polypeptides.

Accordingly, compositions containing two or more GAT polynucleotides orencoded polypeptides are a feature of the invention. In some cases,these compositions are libraries of nucleic acids containing, e.g., atleast 3 or more such nucleic acids. Compositions produced by digestingthe nucleic acids of the invention with a restriction endonuclease, aDNAse or an RNAse, or otherwise fragmenting the nucleic acids, e.g.,mechanical shearing, chemical cleavage, etc., are also a feature of theinvention, as are compositions produced by incubating a nucleic acid ofthe invention with deoxyribonucleotide triphosphates and a nucleic acidpolymerase, such as a thermostable nucleic acid polymerase.

Cells transduced by a vector of the invention, or which otherwiseincorporate a nucleic acid of the invention, are an aspect of theinvention. In a preferred embodiment, the cells express a polypeptideencoded by the nucleic acid of the invention.

In some embodiments, the cells incorporating the nucleic acids of theinvention are plant cells. Transgenic plants, transgenic plant cells,and transgenic plant explants incorporating the nucleic acids of theinvention are also a feature of the invention. In some embodiments, thetransgenic plants, transgenic plant cells, or transgenic plant explantsexpress an exogenous polypeptide with glyphosate-N-acetyltransferaseactivity encoded by the nucleic acid of the invention. The inventionalso provides transgenic seeds produced by the transgenic plants of theinvention.

The invention further provides transgenic plants, transgenic plantcells, transgenic plant explants, or transgenic seeds having enhancedtolerance to glyphosate due to the expression of a polypeptide withglyphosate-N-acetyltransferase activity and a polypeptide that impartsglyphosate tolerance by another mechanism, such as a glyphosate-tolerant5-enolpyruvylshikimate-3-phosphate synthase and/or a glyphosate-tolerantglyphosate oxido-reductase. In a further embodiment, the inventionprovides transgenic plants or transgenic plant explants having enhancedtolerance to glyphosate, as well as tolerance to an additional herbicidedue to the expression of a polypeptide withglyphosate-N-acetyltransferase activity, a polypeptide that impartsglyphosate tolerance by another mechanism, such as a glyphosate-tolerant5-enolpyruvylshikimate-3-phosphate synthase and/or a glyphosate-tolerantglyphosate oxido-reductase and a polypeptide imparting tolerance to theadditional herbicide, such as a mutatedhydroxyphenylpyruvatedioxygenase, a sulfonamide-tolerant acetolactatesynthase, a sulfonamide-tolerant acetohydroxy acid synthase, animidazolinone-tolerant acetolactate synthase, an imidazolinone-tolerantacetohydroxy acid synthase, a phosphinothricin acetyltransferase and amutated protoporphyrinogen oxidase.

The invention also provides transgenic plants, transgenic plant cells,transgenic plant explants, or transgenic seeds having enhanced toleranceto glyphosate, as well as tolerance to an additional herbicide due tothe expression of a polypeptide with glyphosate-N-acetyltransferaseactivity and a polypeptide imparting tolerance to an additionalherbicide, such as, a mutated hydroxyphenylpyruvatedioxygenase, asulfonamide-tolerant acetolactate synthase, a sulfonamide-tolerantacetohydroxy acid synthase, an imidazolinone-tolerant acetolactatesynthase, an imidazolinone-tolerant acetohydroxy acid synthase, aphosphinothricin acetyltransferase and a mutated protoporphyrinogenoxidase. The invention also provides transgenic plants, transgenic plantcells, transgenic plant explants, or transgenic seeds having enhancedtolerance to glyphosate as well as additional desirable traits which maybe conferred by one or more additional transgenes.

Methods of producing the polypeptides of the invention by introducingthe nucleic acids encoding them into cells and then expressing andoptionally recovering them from the cells or culture medium are afeature of the invention. In preferred embodiments, the cells expressingthe polypeptides of the invention are transgenic plant cells.

Methods of increasing the expression level of a polypeptide of theinvention in a plant or plant cell by inserting into the polypeptidecoding sequence one or two G/C-rich codons (such as GCG or GCT)immediately adjacent to and downstream of the initiating methionine ATGcodon, and/or substituting in the polypeptide coding sequence one ormore codons which are less frequently utilized in plants for codonsencoding the same amino acid(s) which are more frequently utilized inplants, and introducing the modified coding sequence into a plant orplant cell and expressing the modified coding sequence, are also afeature of the invention.

Polypeptides that are specifically bound by a polyclonal antisera thatreacts against an antigen derived from SEQ ID NO: 568, 569, 570, 571,572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585,586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599,600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613,614, 615, 616, 617, 618, 619, 621, 623, 625, 627, 629, 631, 633, 635,637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657, 659, 661, 663,665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691,693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719,721, 723, 725, 727, 729, 731, 733, 735, 737, 739, 741, 743, 745, 747,749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 775,777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803,805, 807, 809, 811, 813, 815, 817, 819, 821, 823, 825, 833, 835, 837,839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865,867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893,895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915, 917, 919, 921,923, 925, 927, 929, 931, 953, 954, 955, 956, 957, 958, 959, 960, 961,962, 963, 964, 965, 966, 967, 968, 969, 970, 971, and 972 but not to anaturally occurring related sequence, e.g., such as a peptiderepresented by a subsequence of those of GenBank accession numberCAA70664, as well as antibodies which are produced by administering anantigen derived from any one or more of SEQ ID NO: 568, 569, 570, 571,572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585,586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599,600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613,614, 615, 616, 617, 618, 619, 621, 623, 625, 627, 629, 631, 633, 635,637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657, 659, 661, 663,665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691,693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719,721, 723, 725, 727, 729, 731, 733, 735, 737, 739, 741, 743, 745, 747,749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 775,777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803,805, 807, 809, 811, 813, 815, 817, 819, 821, 823, 825, 833, 835, 837,839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865,867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893,895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915, 917, 919, 921,923, 925, 927, 929, 931, 953, 954, 955, 956, 957, 958, 959, 960, 961,962, 963, 964, 965, 966, 967, 968, 969, 970, 971, and 972 and/or whichbind specifically to such antigens and which do not specifically bind toa naturally occurring polypeptide corresponding to those of GenBankaccession number CAA70664, are all features of the invention.

Another aspect of the invention relates to methods of polynucleotidediversification to produce novel GAT polynucleotides and polypeptides byrecombining or mutating the nucleic acids of the invention in vitro orin vivo. In an embodiment, the recombination produces at least onelibrary of recombinant GAT polynucleotides. The libraries so producedare embodiments of the invention, as are cells comprising the libraries.Furthermore, methods of producing a modified GAT polynucleotide bymutating a nucleic acid of the invention are embodiments of theinvention. Recombinant and mutant GAT polynucleotides and polypeptidesproduced by the methods of the invention are also embodiments of theinvention.

In some aspects of the invention, diversification is achieved by usingrecursive recombination, which can be accomplished in vitro, in vivo, insilico, or a combination thereof. Some examples of diversificationmethods described in more detail below are family shuffling methods andsynthetic shuffling methods. The invention provides methods forproducing a glyphosate-resistant transgenic plant or plant cell thatinvolve transforming a plant or plant cell with a polynucleotideencoding a glyphosate-N-acetyltransferase, and optionally regenerating atransgenic plant from the transformed plant cell. In some aspects thepolynucleotide is a GAT polynucleotide, optionally a GAT polynucleotidederived from a bacterial source. In some aspects of the invention, themethod can comprise growing the transformed plant or plant cell in aconcentration of glyphosate that inhibits the growth of a wild-typeplant of the same species without inhibiting the growth of thetransformed plant. The method can comprise growing the transformed plantor plant cell or progeny of the plant or plant cell in increasingconcentrations of glyphosate and/or in a concentration of glyphosatethat is lethal to a wild-type plant or plant cell of the same species. Aglyphosate-resistant transgenic plant produced by this method can bepropagated, for example by crossing it with a second plant, such that atleast some progeny of the cross display glyphosate tolerance.

The invention further provides methods for selectively controlling weedsin a field containing a crop that involve planting the field with cropseeds or plants which are glyphosate-tolerant as a result of beingtransformed with a gene encoding a glyphosate N-acetyltransferase, andapplying to the crop and weeds in the field a sufficient amount ofglyphosate to control the weeds without significantly affecting thecrop.

The invention further provides methods for controlling weeds in a fieldand preventing the emergence of glyphosate-resistant weeds in a fieldcontaining a crop which involve planting the field with crop seeds orplants that are glyphosate-tolerant as a result of being transformedwith a gene encoding a glyphosate-N-acetyltransferase and a geneencoding a polypeptide imparting glyphosate tolerance by anothermechanism, such as a glyphosate-tolerant5-enolpyruvylshikimate-3-phosphate synthase and/or a glyphosate-tolerantglyphosate oxido-reductase and applying to the crop and the weeds in thefield a sufficient amount of glyphosate to control the weeds withoutsignificantly affecting the crop.

In a further embodiment the invention provides methods for controllingweeds in a field and preventing the emergence of herbicide resistantweeds in a field containing a crop which involve planting the field withcrop seeds or plants that are glyphosate-tolerant as a result of beingtransformed with a gene encoding a glyphosate-N-acetyltransferase, agene encoding a polypeptide imparting glyphosate tolerance by anothermechanism, such as a glyphosate-tolerant5-enolpyruvylshikimate-3-phosphate synthase and/or a glyphosate-tolerantglyphosate oxido-reductase and a gene encoding a polypeptide impartingtolerance to an additional herbicide, such as a mutatedhydroxyphenylpyruvatedioxygenase, a sulfonamide-tolerant acetolactatesynthase, a sulfonamide-tolerant acetohydroxy acid synthase, animidazolinone-tolerant acetolactate synthase, an imidazolinone-tolerantacetohydroxy acid synthase, a phosphinothricin acetyltransferase and amutated protoporphyrinogen oxidase and applying to the crop and theweeds in the field a sufficient amount of glyphosate and an additionalherbicide, such as, a hydroxyphenylpyruvatedioxygenase inhibitor,sulfonamide, imidazolinone, bialaphos, phosphinothricin, azafenidin,butafenacil, sulfosate, glufosinate, and a protox inhibitor to controlthe weeds without significantly affecting the crop.

The invention further provides methods for controlling weeds in a fieldand preventing the emergence of herbicide resistant weeds in a fieldcontaining a crop which involve planting the field with crop seeds orplants that are glyphosate-tolerant as a result of being transformedwith a gene encoding a glyphosate-N-acetyltransferase and a geneencoding a polypeptide imparting tolerance to an additional herbicide,such as a mutated hydroxyphenylpyruvatedioxygenase, asulfonamide-tolerant acetolactate synthase, a sulfonamide-tolerantacetohydroxy acid synthase, an imidazolinone-tolerant acetolactatesynthase, an imidazolinone-tolerant acetohydroxy acid synthase, aphosphinothricin acetyltransferase and a mutated protoporphyrinogenoxidase and applying to the crop and the weeds in the field a sufficientamount of glyphosate and an additional herbicide, such as ahydroxyphenylpyruvatedioxygenase inhibitor, sulfonamide, imidazolinone,bialaphos, phosphinothricin, azafenidin, butafenacil, sulfosate,glufosinate, and a protox inhibitor to control the weeds withoutsignificantly affecting the crop.

The invention further provides methods for producing a geneticallytransformed plant that is tolerant to glyphosate that involve insertinginto the genome of a plant cell a recombinant, double-stranded DNAmolecule comprising: (i) a promoter which functions in plant cells tocause the production of an RNA sequence; (ii) a structural DNA sequencethat causes the production of an RNA sequence which encodes a GAT; and(iii) a 3′ non-translated region which functions in plant cells to causethe addition of a stretch of polyadenyl nucleotides to the 3′ end of theRNA sequence; where the promoter is heterologous with respect to thestructural DNA sequence and adapted to cause sufficient expression ofthe encoded polypeptide to enhance the glyphosate tolerance of a plantcell transformed with the DNA molecule; obtaining a transformed plantcell; and regenerating from the transformed plant cell a geneticallytransformed plant which has increased tolerance to glyphosate.

The invention further provides methods for producing a crop that involvegrowing a crop plant that is glyphosate-tolerant as a result of beingtransformed with a gene encoding a glyphosate N-acetyltransferase, underconditions such that the crop plant produces a crop; and harvesting acrop from the crop plant. These methods often include applyingglyphosate to the crop plant at a concentration effective to controlweeds. Exemplary crop plants include cotton, corn, and soybean.

The invention also provides computers, computer readable medium andintegrated systems, including databases that are composed of sequencerecords including character strings corresponding to SEQ ID NO: 1-10,16, 48, 190, 193, 196, 202, 205, 268, 300, 442, 445, 448, 454, 457,515-830 and 832-972. Such integrated systems optionally include one ormore instruction set for selecting, aligning, translating,reverse-translating or viewing any one or more character stringscorresponding to SEQ ID NO: 1-10, 16, 48, 190, 193, 196, 202, 205, 268,300, 442, 445, 448, 454, 457, 515-830 and 832-972, with each otherand/or with any additional nucleic acid or amino acid sequence.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 depicts the N-acetylation of glyphosate catalyzed by aglyphosate-N-acetyltransferase (“GAT”).

FIG. 2 illustrates mass spectroscopic detection of N-acetylglyphosateproduced by an exemplary Bacillus culture expressing a native GATactivity. Relative abundance is shown on the vertical axis.

FIG. 3 is a table illustrating the relative identity between GATsequences isolated from different strains of bacteria and yitI fromBacillus subtilis.

FIG. 4 is a map of the plasmid pMAXY2120 for expression and purificationof the GAT enzyme from E. coli cultures.

FIG. 5 is a mass spectrometry output showing increasedN-acetylglyphosate production over time in a typical GAT enzyme reactionmix.

FIG. 6 is a plot of the kinetic data of a GAT enzyme from which a K_(M)of 2.9 mM for glyphosate was calculated.

FIG. 7 is a plot of the kinetic data taken from the data of FIG. 6 fromwhich a K_(M) of 2 μM was calculated for Acetyl CoA.

FIG. 8 is a scheme that describes the degradation of glyphosate in soilthrough the AMPA pathway.

FIG. 9 is a scheme that describes the sarcosine pathway of glyphosatedegradation.

FIG. 10 is the BLOSUM62 matrix.

FIG. 11 is a map of the plasmid pMAXY2190.

FIG. 12 depicts a T-DNA construct with gat selectable marker.

FIG. 13 depicts a yeast expression vector with gat selectable marker.

FIG. 14 illustrates effect of glyphosate on plant height at tasseling.

FIGS. 15A and 15B provide a comparison of the kinetic parameters K_(m)and k_(cat)/K_(m), respectively, for various GAT enzymes assayed ineither the absence of added KCl (unshaded bars) or in the presence of 20mM KCl (shaded bars) as described in Example 18. Error bars representthe standard deviation of multiple assays, where available.

FIGS. 16A, 16B and 16C provide a comparison of the kinetic parametersK_(m), k_(cat), and k_(cat)/K_(m), respectively, of various GAT enzymesof the invention (unshaded bars) to the kinetic parameters of somefurther evolved GAT enzymes of the invention (shaded bars), as describedin Example 19. Error bars represent the standard deviation of multipleassays, where available.

FIG. 17 depicts remaining GAT activity after incubation at varioustemperatures as described in Example 16.

FIG. 18 depicts the effect of pH on K_(cat) and K_(M) as described inExample 30.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to a novel class of enzymes exhibitingN-acetyltransferase activity. In one aspect, the invention relates to anovel class of enzymes capable of acetylating glyphosate and glyphosateanalogs, e.g., enzymes possessing glyphosate-N-acetyltransferase (“GAT”)activity. Such enzymes are characterized by the ability to acetylate thesecondary amine of a compound. In some aspects of the invention, thiscompound is an herbicide, e.g., glyphosate, as illustrated schematicallyin FIG. 1. This compound can also be a glyphosate analog or a metabolicproduct of glyphosate degradation, e.g., aminomethylphosphonic acid.Although the acetylation of glyphosate is a key catalytic step in onemetabolic pathway for catabolism of glyphosate, the enzymaticacetylation of glyphosate by naturally-occurring, isolated, orrecombinant enzymes has not been previously described. Thus, the nucleicacids and polypeptides of the invention provide a new biochemicalpathway for engineering herbicide resistance.

In one aspect, the invention provides novel genes encoding GATpolypeptides. Isolated and recombinant GAT polynucleotides correspondingto naturally occurring polynucleotides, as well as recombinant andengineered, e.g., diversified, GAT polynucleotides are a feature of theinvention. GAT polynucleotides are exemplified by SEQ ID NO: 516, 517,518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531,532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545,546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559,560, 561, 562, 563, 564, 565, 566, 567, 620, 622, 624, 626, 628, 630,632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658,660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686,688, 690, 692, 694, 696, 698, 700, 702, 704, 706, 708, 710, 712, 714,716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742,744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 768, 770, 772,774, 776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800,802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 824, 832, 834,836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862,864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890,892, 894, 896, 898, 900, 902, 904, 906, 908, 910, 912, 914, 916, 918,920, 922, 924, 926, 928, 930, 932, 933, 934, 935, 936, 937, 938, 939,940, 941, 942, 943, 944, 945, 947, 949, 951, and 952. Specific GATpolynucleotide and polypeptide sequences are provided as examples tohelp illustrate the invention, and are not intended to limit the scopeof the genus of GAT polynucleotides and polypeptides described and/orclaimed herein.

The invention also provides methods for generating and selectingdiversified libraries to produce additional GAT polynucleotides,including polynucleotides encoding GAT polypeptides with improved and/orenhanced characteristics, e.g., altered K_(m) for glyphosate, increasedrate of catalysis, increased stability, etc., based upon selection of apolynucleotide constituent of the library for the new or improvedactivities described herein. Such polynucleotides are especiallyfavorably employed in the production of glyphosate-resistant transgenicplants.

The GAT polypeptides of the invention exhibit a novel enzymaticactivity. Specifically, the enzymatic acetylation of the syntheticherbicide glyphosate has not been recognized prior to the presentinvention. Thus, the polypeptides herein described, e.g., as exemplifiedby SEQ ID NO: 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578,579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592,593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606,607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 621,623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649,651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677,679, 681, 683, 685, 687, 689, 691, 693, 695, 697, 699, 701, 703, 705,707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733,735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761,763, 765, 767, 769, 771, 773, 775, 777, 779, 781, 783, 785, 787, 789,791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 815, 817,819, 821, 823, 825, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851,853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879,881, 883, 885, 887, 889, 891, 893, 895, 897, 899, 901, 903, 905, 907,909, 911, 913, 915, 917, 919, 921, 923, 925, 927, 929, 931, 946, 948,950, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965,966, 967, 968, 969, 970, 971, and 972 define a novel biochemical pathwayfor the detoxification of glyphosate that is functional in vivo, e.g.,in plants.

Accordingly, the nucleic acids and polypeptides of the invention are ofsignificant utility in the generation of glyphosate-resistant plants byproviding new nucleic acids, polypeptides and biochemical pathways forthe engineering of herbicide selectivity in transgenic plants.

DEFINITIONS

Before describing the present invention in detail, it is to beunderstood that this invention is not limited to particular compositionsor biological systems, which can, of course, vary. It is also to beunderstood that the terminology used herein is for the purpose ofdescribing particular embodiments only, and is not intended to belimiting. As used in this specification and the appended claims, thesingular forms “a,” “an,” and “the” include plural referents unless thecontent clearly dictates otherwise. Thus, for example, reference to “adevice” includes a combination of two or more such devices, reference to“a gene fusion construct” includes mixtures of constructs, and the like.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which the invention pertains. Although any methods andmaterials similar or equivalent to those described herein can be used inthe practice for testing of the present invention, specific examples ofappropriate materials and methods are described herein.

In describing and claiming the present invention, the followingterminology will be used in accordance with the definitions set outbelow.

Accordingly, for purposes of the present invention, the term“glyphosate” should be considered to include any herbicidally effectiveform of N-phosphonomethylglycine (including any salt thereof) and otherforms which result in the production of the glyphosate anion in planta.The term “glyphosate analog” refers to any structural analog ofglyphosate that has the ability to inhibit EPSPS at levels such that theglyphosate analog is herbicidally effective.

As used herein, the term “glyphosate-N-acetyltransferase activity” or“GAT activity” refers to the ability to catalyze the acetylation of thesecondary amine group of glyphosate, as illustrated, for example, inFIG. 1. A “glyphosate-N-acetyltransferase” or “GAT” is an enzyme thatcatalyzes the acetylation of the amine group of glyphosate, a glyphosateanalog, and/or a glyphosate primary metabolite (i.e., AMPA orsarcosine). In some preferred embodiments of the invention, a GAT isable to transfer the acetyl group from Acetyl CoA to the secondary amineof glyphosate and the primary amine of AMPA. In addition, some GATs arealso able to transfer the propionyl group of propionyl CoA toglyphosate, indicating that GAT is also an acyltransferase. Theexemplary GATs described herein are active from about pH 5-9, withoptimal activity in the range of about pH 6.5-8.0. Activity can bequantified using various kinetic parameters which are well known in theart, e.g., k_(cat), K_(M), and k_(cat)/K_(M). These kinetic parameterscan be determined as described below in Example 7 or Example 19.

The terms “polynucleotide,” “nucleotide sequence,” and “nucleic acid”are used to refer to a polymer of nucleotides (A, C, T, U, G, etc. ornaturally occurring or artificial nucleotide analogues), e.g., DNA orRNA, or a representation thereof, e.g., a character string, etc.,depending on the relevant context. A given polynucleotide orcomplementary polynucleotide can be determined from any specifiednucleotide sequence.

Similarly, an “amino acid sequence” is a polymer of amino acids (aprotein, polypeptide, etc.) or a character string representing an aminoacid polymer, depending on context. The terms “protein,” “polypeptide,”and “peptide” are used interchangeably herein.

A polynucleotide, polypeptide, or other component is “isolated” when itis partially or completely separated from components with which it isnormally associated (other proteins, nucleic acids, cells, syntheticreagents, etc.). A nucleic acid or polypeptide is “recombinant” when itis artificial or engineered, or derived from an artificial or engineeredprotein or nucleic acid. For example, a polynucleotide that is insertedinto a vector or any other heterologous location, e.g., in a genome of arecombinant organism, such that it is not associated with nucleotidesequences that normally flank the polynucleotide as it is found innature is a recombinant polynucleotide. A protein expressed in vitro orin vivo from a recombinant polynucleotide is an example of a recombinantpolypeptide. Likewise, a polynucleotide sequence that does not appear innature, for example a variant of a naturally occurring gene, isrecombinant.

The terms “glyphosate-N-acetyltransferase polypeptide” and “GATpolypeptide” are used interchangeably to refer to any of a family ofnovel polypeptides provided herein.

The terms “glyphosate-N-acetyltransferase polynucleotide” and “GATpolynucleotide” are used interchangeably to refer to a polynucleotidethat encodes a GAT polypeptide.

A “subsequence” or “fragment” is any portion of an entire sequence.

Numbering of an amino acid or nucleotide polymer corresponds tonumbering of a selected amino acid polymer or nucleic acid when theposition of a given monomer component (amino acid residue, incorporatednucleotide, etc.) of the polymer corresponds to the same residueposition in a selected reference polypeptide or polynucleotide.

A vector is a composition for facilitating celltransduction/transformation by a selected nucleic acid, or expression ofthe nucleic acid in the cell. Vectors include, e.g., plasmids, cosmids,viruses, YACs, bacteria, poly-lysine, chromosome integration vectors,episomal vectors, etc.

“Substantially an entire length of a polynucleotide or amino acidsequence” refers to at least about 70%, generally at least about 80%, ortypically about 90% or more of a sequence.

As used herein, an “antibody” refers to a protein comprising one or morepolypeptides substantially or partially encoded by immunoglobulin genesor fragments of immunoglobulin genes. The recognized immunoglobulingenes include the kappa, lambda, alpha, gamma, delta, epsilon and muconstant region genes, as well as myriad immunoglobulin variable regiongenes. Light chains are classified as either kappa or lambda. Heavychains are classified as gamma, mu, alpha, delta, or epsilon, which inturn define the immunoglobulin classes, IgG, IgM, IgA, IgD and IgE,respectively. A typical immunoglobulin (antibody) structural unitcomprises a tetramer. Each tetramer is composed of two identical pairsof polypeptide chains, each pair having one “light” (about 25 kD) andone “heavy” chain (about 50-70 kD). The N-terminus of each chain definesa variable region of about 100 to 110 or more amino acids primarilyresponsible for antigen recognition. The terms variable light chain(V_(L)) and variable heavy chain (V_(H)) refer to these light and heavychains respectively. Antibodies exist as intact immunoglobulins or as anumber of well characterized fragments produced by digestion withvarious peptidases. Thus, for example, pepsin digests an antibody belowthe disulfide linkages in the hinge region to produce F(ab)′2, a dimerof Fab which itself is a light chain joined to VH-CH1 by a disulfidebond. The F(ab)′2 may be reduced under mild conditions to break thedisulfide linkage in the hinge region thereby converting the (Fab′)2dimer into an Fab′ monomer. The Fab′ monomer is essentially a Fab withpart of the hinge region (see, Paul, ed. (1998) Fundamental Immunology(4^(th) Edition, Raven Press, NY), for a more detailed description ofother antibody fragments). While various antibody fragments are definedin terms of the digestion of an intact antibody, one of skill willappreciate that such Fab′ fragments may be synthesized de novo eitherchemically or by utilizing recombinant DNA methodology. Thus, the termantibody as used herein also includes antibody fragments either producedby the modification of whole antibodies or synthesized de novo usingrecombinant DNA methodologies. Antibodies include single chainantibodies, including single chain Fv (sFv) antibodies in which avariable heavy and a variable light chain are joined together (directlyor through a peptide linker) to form a continuous polypeptide.

A “chloroplast transit peptide” is an amino acid sequence which istranslated in conjunction with a protein and directs the protein to thechloroplast or other plastid types present in the cell in which theprotein is made. “Chloroplast transit sequence” refers to a nucleotidesequence that encodes a chloroplast transit peptide.

A “signal peptide” is an amino acid sequence which is translated inconjunction with a protein and directs the protein to the secretorysystem (Chrispeels (1991) Ann. Rev. Plant Phys. Plant Mol. Biol. 42:21-53). If the protein is to be directed to a vacuole, a vacuolartargeting signal can further be added, or if to the endoplasmicreticulum, an endoplasmic reticulum retention signal may be added. Ifthe protein is to be directed to the nucleus, any signal peptide presentshould be removed and instead a nuclear localization signal included(Raikhel. (1992) Plant Phys. 100: 1627-1632).

The terms “diversification” and “diversity,” as applied to apolynucleotide, refers to generation of a plurality of modified forms ofa parental polynucleotide, or plurality of parental polynucleotides. Inthe case where the polynucleotide encodes a polypeptide, diversity inthe nucleotide sequence of the polynucleotide can result in diversity inthe corresponding encoded polypeptide, e.g. a diverse pool ofpolynucleotides encoding a plurality of polypeptide variants. In someembodiments of the invention, this sequence diversity is exploited byscreening/selecting a library of diversified polynucleotides forvariants with desirable functional attributes, e.g., a polynucleotideencoding a GAT polypeptide with enhanced functional characteristics.

The term “encoding” refers to the ability of a nucleotide sequence tocode for one or more amino acids. The term does not require a start orstop codon. An amino acid sequence can be encoded in any one of sixdifferent reading frames provided by a polynucleotide sequence and itscomplement.

When used herein, the term “artificial variant” refers to a polypeptidehaving GAT activity, which is encoded by a modified GAT polynucleotide,e.g., a modified form of any one of SEQ ID NO: 516, 517, 518, 519, 520,521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534,535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548,549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562,563, 564, 565, 566, 567, 620, 622, 624, 626, 628, 630, 632, 634, 636,638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 664,666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692,694, 696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720,722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748,750, 752, 754, 756, 758, 760, 762, 764, 768, 770, 772, 774, 776, 778,780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806,808, 810, 812, 814, 816, 818, 820, 822, 824, 832, 834, 836, 838, 840,842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 864, 866, 868,870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, 892, 894, 896,898, 900, 902, 904, 906, 908, 910, 912, 914, 916, 918, 920, 922, 924,926, 928, 930, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942,943, 944, 945, 947, 949, 951, and 952 or of a naturally occurring GATpolynucleotide isolated from an organism. The modified polynucleotide,from which an artificial variant is produced when expressed in asuitable host, is obtained through human intervention by modification ofa GAT polynucleotide.

The term “nucleic acid construct” or “polynucleotide construct” means anucleic acid molecule, either single-stranded or double-stranded, whichis isolated from a naturally occurring gene or which has been modifiedto contain segments of nucleic acids in a manner that would nototherwise exist in nature. The term nucleic acid construct is synonymouswith the term “expression cassette” when the nucleic acid constructcontains the control sequences required for expression of a codingsequence of the present invention.

The term “control sequences” is defined herein to include allcomponents, which are necessary or advantageous for the expression of apolypeptide of the present invention. Each control sequence may benative or foreign to the nucleotide sequence encoding the polypeptide.Such control sequences include, but are not limited to, a leadersequence, polyadenylation sequence, propeptide sequence, promotersequence, signal peptide sequence, and transcription terminatorsequence. At a minimum, the control sequences include a promoter andtranscriptional and translational stop signals. The control sequencesmay be provided with linkers for the purpose of introducing specificrestriction sites facilitating ligation of the control sequences withthe coding region of the nucleotide sequence encoding a polypeptide.

The term “operably linked” is defined herein as a configuration in whicha control sequence is appropriately placed at a position relative to thecoding sequence of the DNA sequence such that the control sequencedirects the expression of a polypeptide.

When used herein the term “coding sequence” is intended to cover anucleotide sequence, which directly specifies the amino acid sequence ofits protein product. The boundaries of the coding sequence are generallydetermined by an open reading frame, which usually begins with the ATGstart codon. The coding sequence typically includes a DNA, cDNA, and/orrecombinant nucleotide sequence.

In the present context, the term “expression” includes any step involvedin the production of the polypeptide including, but not limited to,transcription, post-transcriptional modification, translation,post-translational modification, and secretion.

In the present context, the term “expression vector” covers a DNAmolecule, linear or circular, that comprises a segment encoding apolypeptide of the invention, and which is operably linked to additionalsegments that provide for its transcription.

The term “host cell”, as used herein, includes any cell type which issusceptible to transformation with a nucleic acid construct.

The term “plant” includes whole plants, shoot vegetativeorgans/structures (e.g., leaves, stems and tubers), roots, flowers andfloral organs/structures (e.g., bracts, sepals, petals, stamens,carpels, anthers and ovules), seed (including embryo, endosperm, andseed coat) and fruit (the mature ovary), plant tissue (e.g., vasculartissue, ground tissue, and the like) and cells (e.g., guard cells, eggcells, trichomes and the like), and progeny of same. The class of plantsthat can be used in the method of the invention is generally as broad asthe class of higher and lower plants amenable to transformationtechniques, including angiosperms (monocotyledonous and dicotyledonousplants), gymnosperms, ferns, and multicellular algae. It includes plantsof a variety of ploidy levels, including aneuploid, polyploid, diploid,haploid and hemizygous.

The term “heterologous” as used herein describes a relationship betweentwo or more elements which indicates that the elements are not normallyfound in proximity to one another in nature. Thus, for example, apolynucleotide sequence is “heterologous to” an organism or a secondpolynucleotide sequence if it originates from a foreign species, or, iffrom the same species, is modified from its original form. For example,a promoter operably linked to a heterologous coding sequence refers to acoding sequence from a species different from that from which thepromoter was derived, or, if from the same species, a coding sequencewhich is not naturally associated with the promoter (e.g., a geneticallyengineered coding sequence or an allele from a different ecotype orvariety). An example of a heterologous polypeptide is a polypeptideexpressed from a recombinant polynucleotide in a transgenic organism.Heterologous polynucleotides and polypeptides are forms of recombinantmolecules.

A variety of additional terms are defined or otherwise characterizedherein.

Glyphosate-N-Acetyltransferases

In one aspect, the invention provides a novel family of isolated orrecombinant enzymes referred to herein as“glyphosate-N-acetyltransferases,” “GATs,” or “GAT enzymes.” GATs areenzymes that have GAT activity, preferably sufficient activity to confersome degree of glyphosate tolerance upon a transgenic plant engineeredto express the GAT. Some examples of GATs include GAT polypeptides,described in more detail below.

GAT-mediated glyphosate tolerance is a complex function of GAT activity,GAT expression levels in the transgenic plant, the particular plant, andnumerous other factors, including but not limited to the nature andtiming of herbicide application. One of skill in the art can determinewithout undue experimentation the level of GAT activity required toeffect glyphosate tolerance in a particular context.

GAT activity can be characterized using the conventional kineticparameters k_(cat), K_(M), and k_(cat)/K_(M). k_(cat) can be thought ofas a measure of the rate of acetylation, particularly at high substrateconcentrations, K_(M) is a measure of the affinity of the GAT for itssubstrates (e.g., acetyl CoA, propionyl CoA and glyphosate), andk_(cat)/K_(M) is a measure of catalytic efficiency that takes bothsubstrate affinity and catalytic rate into account. k_(cat)/K_(m) isparticularly important in the situation where the concentration of asubstrate is at least partially rate-limiting. In general, a GAT with ahigher k_(cat) or k_(cat)/K_(M) is a more efficient catalyst thananother GAT with lower k_(cat) or k_(cat)/K_(M). A GAT with a lowerK_(M) is a more efficient catalyst than another GAT with a higher K_(M).Thus, to determine whether one GAT is more effective than another, onecan compare kinetic parameters for the two enzymes. The relativeimportance of k_(cat), k_(cat)/K_(M) and K_(M) will vary depending uponthe context in which the GAT will be expected to function, e.g., theanticipated effective concentration of glyphosate relative to the K_(M)for glyphosate. GAT activity can also be characterized in terms of anyof a number of functional characteristics, including but not limited tostability, susceptibility to inhibition, or activation by othermolecules.

Glyphosate-N-Acetyltransferase Polypeptides

In one aspect, the invention provides a novel family of isolated orrecombinant polypeptides referred to herein as“glyphosate-N-acetyltransferase polypeptides” or “GAT polypeptides.” GATpolypeptides are characterized by their structural similarity to a novelfamily of GATs. Many but not all GAT polypeptides are GATs. Thedistinction is that GATs are defined in terms of function, whereas GATpolypeptides are defined in terms of structure. A subset of the GATpolypeptides consists of those GAT polypeptides that have GAT activity,preferably at a level that will function to confer glyphosate resistanceupon a transgenic plant expressing the protein at an effective level.Some preferred GAT polypeptides for use in conferring glyphosatetolerance have a k_(cat) of at least 1 min⁻¹, or more preferably atleast 10 min⁻¹, 100 min⁻¹ or 1000 min⁻¹. Other preferred GATpolypeptides for use in conferring glyphosate tolerance have a K_(M) nogreater than 100 mM, or more preferably no greater than 10 mM, 1 mM, or0.1 mM. Still other preferred GAT polypeptides for use in conferringglyphosate tolerance have a k_(cat)/K_(M) of at least 1 mM⁻¹ min⁻¹ ormore, preferably at least 10 mM⁻¹ min⁻¹, 100 mM⁻¹ min⁻¹, 1000 mM⁻¹min⁻¹, or 10,000 mM⁻¹ min⁻¹.

Exemplary GAT polypeptides have been isolated and characterized from avariety of bacterial strains. One example of a monomeric GAT polypeptidethat has been isolated and characterized has a molecular radius ofapproximately 17 kD. An exemplary GAT enzyme isolated from a strain ofB. licheniformis, SEQ ID NO:7, exhibits a K_(m) for glyphosate ofapproximately 2.9 mM and a K_(m) for acetyl CoA of approximately 2 μM,with a k_(cat) equal to 6/minute.

The term “GAT polypeptide” refers to any polypeptide comprising an aminoacid sequence that can be optimally aligned with an amino acid sequenceselected from the group consisting of SEQ ID NO:300, 445, and 457 togenerate a similarity score of at least 460 using the BLOSUM62 matrix, agap existence penalty of 11, and a gap extension penalty of 1, whereinat least one of the following positions conform to the followingrestrictions: (i) at positions 18 and 38, there is a Z5 amino acidresidue; (ii) at position 62, there is a Z1 amino acid residue; (iii) atposition 124, there is a Z6 amino acid residue; and (iv) at position144, there is a Z2 amino acid residue, wherein: Z1 is an amino acidresidue selected from the group consisting of A, I, L, M, and V; Z2 isan amino acid residue selected from the group consisting of F, W, and Y;Z5 is an amino acid residue selected from the group consisting of D andE; and Z6 is an amino acid residue selected from the group consisting ofC, G, and P. Some aspects of the invention pertain to GAT polypeptidescomprising an amino acid sequence that can be optimally aligned with anamino acid sequence selected from the group consisting of SEQ ID NO:300, 445, and 457 to generate a similarity score of at least 440, 445,450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 505, 510, 515,520, 525, 530, 535, 540, 545, 550, 555, 560, 565, 570, 575, 580, 585,590, 595, 600, 605, 610, 615, 620, 625, 630, 635, 640, 645, 650, 655,660, 665, 670, 675, 680, 685, 690, 695, 700, 705, 710, 715, 720, 725,730, 735, 740, 745, 750, 755, or 760 using the BLOSUM62 matrix, a gapexistence penalty of 11, and a gap extension penalty of 1, wherein oneor more of the following positions conform to the followingrestrictions: (i) at positions 18 and 38, a Z5 amino acid residue; (ii)at position 62, a Z1 amino acid residue; (iii) at position 124, a Z6amino acid residue; and (iv) at position 144, a Z2 amino acid residue,wherein: Z1 is an amino acid residue selected from the group consistingof A, I, L, M, and V; Z2 is an amino acid residue selected from thegroup consisting of F, W, and Y; Z5 is an amino acid residue selectedfrom the group consisting of D and E; and Z6 is an amino acid residueselected from the group consisting of C, G, and P.

Two sequences are “optimally aligned” when they are aligned forsimilarity scoring using a defined amino acid substitution matrix (e.g.,BLOSUM62), gap existence penalty and gap extension penalty so as toarrive at the highest score possible for that pair of sequences. Aminoacid substitution matrices and their use in quantifying the similaritybetween two sequences are well-known in the art and described, e.g., inDayhoff et al. (1978) “A model of evolutionary change in proteins” in“Atlas of Protein Sequence and Structure,” Vol. 5, Suppl. 3 (ed. M. O.Dayhoff), pp. 345-352. Natl. Biomed. Res. Found., Washington, D.C. andHenikoff et al. (1992) Proc. Nat'l. Acad. Sci. USA 89: 10915-10919. TheBLOSUM62 matrix (FIG. 10) is often used as a default scoringsubstitution matrix in sequence alignment protocols such as Gapped BLAST2.0. The gap existence penalty is imposed for the introduction of asingle amino acid gap in one of the aligned sequences, and the gapextension penalty is imposed for each additional empty amino acidposition inserted into an already opened gap. The alignment is definedby the amino acids positions of each sequence at which the alignmentbegins and ends, and optionally by the insertion of a gap or multiplegaps in one or both sequences so as to arrive at the highest possiblescore. While optimal alignment and scoring can be accomplished manually,the process is facilitated by the use of a computer-implementedalignment algorithm, e.g., gapped BLAST 2.0, described in Altschul etal. (1997) Nucl. Acids Res. 25: 3389-3402, and made available to thepublic at the National Center for Biotechnology Information (NCBI)Website (www.ncbi.nlm.nih.gov). Optimal alignments, including multiplealignments, can be prepared using, e.g., PSI-BLAST, available throughthe NCBI website and described by Altschul et al. (1997) Nucl. AcidsRes. 25:3389-3402.

With respect to an amino acid sequence that is optimally aligned with areference sequence, an amino acid residue “corresponds to” the positionin the reference sequence with which the residue is paired in thealignment. The “position” is denoted by a number that sequentiallyidentifies each amino acid in the reference sequence based on itsposition relative to the N-terminus. For example, in SEQ ID NO:300,position 1 is M, position 2 is I, position 3 is E, etc. When a testsequence is optimally aligned with SEQ ID NO:300, a residue in the testsequence that aligns with the E at position 3 is said to “correspond toposition 3” of SEQ ID NO:300. Owing to deletions, insertion,truncations, fusions, etc., that must be taken into account whendetermining an optimal alignment, in general the amino acid residuenumber in a test sequence as determined by simply counting from theN-terminal will not necessarily be the same as the number of itscorresponding position in the reference sequence. For example, in a casewhere there is a deletion in an aligned test sequence, there will be noamino acid that corresponds to a position in the reference sequence atthe site of deletion. Where there is an insertion in an alignedreference sequence, that insertion will not correspond to any amino acidposition in the reference sequence. In the case of truncations orfusions there can be stretches of amino acids in either the reference oraligned sequence that do not correspond to any amino acid in thecorresponding sequence.

The term “GAT polypeptide” further refers to any polypeptide comprisingan amino acid sequence selected from the group consisting of: (a) anamino acid sequence that is at least 98% identical to SEQ ID NO:577; (b)an amino acid sequence that is at least 97% identical to SEQ ID NO:578;(c) an amino acid sequence that is at least 97% identical to SEQ IDNO:621; (d) an amino acid sequence that is at least 98% identical to SEQID NO:579; (e) an amino acid sequence that is at least 98% identical toSEQ ID NO:602; (f) an amino acid sequence that is at least 95% identicalto SEQ ID NO:697; (g) an amino acid sequence that is at least 96%identical to SEQ ID NO:721; (h) an amino acid sequence that is at least97% identical to SEQ ID NO:613; (i) an amino acid sequence that is atleast 89% identical to SEQ ID NO:677; (j) an amino acid sequence that isat least 96% identical to SEQ ID NO:584; (k) an amino acid sequence thatis at least 98% identical to SEQ ID NO:707; (l) an amino acid sequencethat is at least 98% identical to SEQ ID NO:616; (m) an amino acidsequence that is at least 96% identical to SEQ ID NO:612; and (n) anamino acid sequence that is at least 98% identical to SEQ ID NO:590.

The term “GAT polypeptide” further refers to any polypeptide comprisingan amino acid sequence having at least 89% sequence identity withresidues 1-96 of the amino acid sequence of SEQ ID NO:677; an amino acidsequence having at least 95% sequence identity with residues 1-96 of theamino acid sequence of SEQ ID NO:697; an amino acid sequence having atleast 96% sequence identity with residues 1-96 of the amino acidsequence selected from the group consisting of SEQ ID NO:584, 612, and721; an amino acid sequence having at least 97% sequence identity withresidues 1-96 of the amino acid sequence selected from the groupconsisting of SEQ ID NO:578, 613, and 621; an amino acid sequence havingat least 98% sequence identity with residues 1-96 of the amino acidsequence selected from the group consisting of SEQ ID NO:577, 579, 590,602, 616, and 707.

The term “GAT polypeptide” further refers to any polypeptide comprisingan amino acid sequence having at least 89% sequence identity withresidues 51-146 of the amino acid sequence of SEQ ID NO:677; an aminoacid sequence having at least 95% sequence identity with residues 51-146of the amino acid sequence of SEQ ID NO:697; an amino acid sequencehaving at least 96% sequence identity with residues 51-146 of the aminoacid sequence selected from the group consisting of SEQ ID NO:584, 612,and 721; an amino acid sequence having at least 97% sequence identitywith residues 51-146 of the amino acid sequence selected from the groupconsisting of SEQ ID NO:578, 613, and 621; an amino acid sequence havingat least 98% sequence identity with residues 51-146 of the amino acidsequence selected from the group consisting of SEQ ID NO:577, 579, 590,602, 616, and 707.

The term “GAT polypeptide” further refers to any polypeptide comprisingan amino acid sequence selected from the group consisting of: (a) anamino acid sequence that is at least 96% identical to residues 2-146 ofSEQ ID NO:919; (b) an amino acid sequence that is at least 97% identicalto residues 2-146 of SEQ ID NO:929; (c) an amino acid sequence that isat least 98% identical to residues 2-146 of SEQ ID NO:847; (d) an aminoacid sequence that is at least 98% identical to residues 2-146 of SEQ IDNO: δ 1; (e) an amino acid sequence that is at least 98% identical toresidues 2-146 of SEQ ID NO:853; (f) an amino acid sequence that is atleast 98% identical to residues 2-146 of SEQ ID NO:855 (such as, forexample, SEQ ID NO:835 or 855); (g) an amino acid sequence that is atleast 98% identical to residues 2-146 of SEQ ID NO:857; (h) an aminoacid sequence that is at least 98% identical to residues 2-146 of SEQ IDNO:861; (i) an amino acid sequence that is at least 98% identical toresidues 2-146 of SEQ ID NO:871; (j) an amino acid sequence that is atleast 98% identical to residues 2-146 of SEQ ID NO:875; (k) an aminoacid sequence that is at least 98% identical to residues 2-146 of SEQ IDNO:881; (l) an amino acid sequence that is at least 98% identical toresidues 2-146 of SEQ ID NO:885; (m) an amino acid sequence that is atleast 98% identical to residues 2-146 of SEQ ID NO:887; (n) an aminoacid sequence that is at least 98% identical to residues 2-146 of SEQ IDNO:889; (o) an amino acid sequence that is at least 98% identical toresidues 2-146 of SEQ ID NO:893; (p) an amino acid sequence that is atleast 98% identical to residues 2-146 of SEQ ID NO:897; (q) an aminoacid sequence that is at least 98% identical to residues 2-146 of SEQ IDNO:899; (r) an amino acid sequence that is at least 98% identical toresidues 2-146 of SEQ ID NO:909; (s) an amino acid sequence that is atleast 98% identical to residues 2-146 of SEQ ID NO:911; (t) an aminoacid sequence that is at least 99% identical to residues 2-146 of SEQ IDNO:837; (u) an amino acid sequence that is at least 99% identical toresidues 2-146 of SEQ ID NO:841; (v) an amino acid sequence that is atleast 99% identical to residues 2-146 of SEQ ID NO:865; (w) an aminoacid sequence that is at least 99% identical to residues 2-146 of SEQ IDNO:869; and (x) an amino acid sequence that is at least 99% identical toresidues 2-146 of SEQ ID NO:879.

The term “GAT polypeptide” further refers to any polypeptide comprisingan amino acid sequence that is at least 95% identical to residues 2-146of SEQ ID NO:929 and which comprises a Gly or an Asn residue at theamino acid position corresponding to position 33 of SEQ ID NO:929.

The term “GAT polypeptide” further refers to any polypeptide comprisingan amino acid sequence that shares at least 60%, 65%, 70%, 75%, 80%,81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%,90%, 91%,92%,93%, 94%,95%,96%, 97%, 98%, 99%, or more sequence identity with an exemplary GATpolypeptide disclosed herein. Thus, for example, GAT polypeptides of theinvention include polypeptides comprising an amino acid sequence thatshares at least 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%,87%, 88%, 89%,90%, 91%,92%,93%, 94%,95%, 96%, 97%, 98%, 99%, or moresequence identity with any of SEQ ID NO: 953, 954, 955, 956, 957, 958,959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, and972.

As used herein, the term “identity” or “percent identity” when used withrespect to a particular pair of aligned amino acid sequences refers tothe percent amino acid sequence identity that is obtained by ClustalWanalysis (version W 1.8 available from European BioinformaticsInstitute, Cambridge, UK), counting the number of identical matches inthe alignment and dividing such number of identical matches by thegreater of (i) the length of the aligned sequences, and (ii) 96, andusing the following default ClustalW parameters to achieve slow/accuratepairwise alignments—Gap Open Penalty: 10; Gap Extension Penalty: 0.10;Protein weight matrix: Gonnet series; DNA weight matrix: IUB; ToggleSlow/Fast pairwise alignments=SLOW or FULL Alignment.

In another aspect, the invention provides an isolated or recombinantpolypeptide that comprises at least 20, or alternatively, at least 50,at least 75, at least 100, at least 125, at least 130, at least 135, atleast 140, at least 141, at least 142, at least 143, at least 144 or atleast 145 contiguous amino acids of an amino acid sequence selected fromthe group consisting of: (a) an amino acid sequence that is at least 98%identical to SEQ ID NO:577; (b) an amino acid sequence that is at least97% identical to SEQ ID NO:578; (c) an amino acid sequence that is atleast 97% identical to SEQ ID NO:621; (d) an amino acid sequence that isat least 98% identical to SEQ ID NO:579; (e) an amino acid sequence thatis at least 98% identical to SEQ ID NO:602; (f) an amino acid sequencethat is at least 95% identical to SEQ ID NO:697; (g) an amino acidsequence that is at least 96% identical to SEQ ID NO:721; (h) an aminoacid sequence that is at least 97% identical to SEQ ID NO:613; (i) anamino acid sequence that is at least 89% identical to SEQ ID NO:677; (j)an amino acid sequence that is at least 96% identical to SEQ ID NO:584;(k) an amino acid sequence that is at least 98% identical to SEQ IDNO:707; (l) an amino acid sequence that is at least 98% identical to SEQID NO:616; (m) an amino acid sequence that is at least 96% identical toSEQ ID NO:612; and (n) an amino acid sequence that is at least 98%identical to SEQ ID NO:590.

In another aspect, the invention provides a polypeptide comprisingresidues 2-146 of an amino acid sequence selected from the groupconsisting of SEQ ID NO: 568, 569, 570, 571, 572, 573, 574, 575, 576,577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590,591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604,605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618,619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645,647, 649, 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673,675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 695, 697, 699, 701,703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729,731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757,759, 761, 763, 765, 767, 769, 771, 773, 775, 777, 779, 781, 783, 785,787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813,815, 817, 819, 821, 823, and 825. In some embodiments of the invention,the amino acid sequence of the polypeptide comprises Met, Met-Ala, orMet-Ala-Ala on the N-terminal side of the amino acid corresponding toposition 2 of the reference amino acid sequence.

Some preferred GAT polypeptides of the invention can be optimallyaligned with a reference amino acid sequence selected from the groupconsisting of SEQ ID NO:300, 445, and 457 to generate a similarity scoreof at least 460 using the BLOSUM62 matrix, a gap existence penalty of11, and a gap extension penalty of 1, wherein at least one of thefollowing positions conforms to the following restrictions: (i) atpositions 18 and 38, there is a Z5 amino acid residue; (ii) at position62, there is a Z1 amino acid residue; (iii) at position 124, there is aZ6 amino acid residue; and (iv) at position 144, there is a Z2 aminoacid residue, wherein: Z1 is an amino acid residue selected from thegroup consisting of A, I, L, M, and V; Z2 is an amino acid residueselected from the group consisting of F, W, and Y; Z5 is an amino acidresidue selected from the group consisting of D and E; and Z6 is anamino acid residue selected from the group consisting of C, G, and P,and further wherein of the amino acid residues in the amino acidsequence that correspond to the following positions, at least 90%conform to the following restrictions: (a) at positions 2, 4, 15, 19,26, 28, 31, 45, 51, 54, 86, 90, 91, 97, 103, 105, 106, 114, 123, 129,139, and/or 145 the amino acid residue is B1; and (b) at positions 3, 5,8, 10, 11, 14, 17, 24, 27, 32, 37, 47, 48, 49, 52, 57, 58, 61, 63, 68,69, 79, 80, 82, 83, 89, 92, 100, 101, 104, 119, 120, 125, 126, 128, 131,and/or 143 the amino acid residue is B2; wherein B1 is an amino acidselected from the group consisting of A, I, L, M, F, W, Y, and V; and B2is an amino acid selected from the group consisting of R, N, D, C, Q, E,G, H, K, P, S, and T. When used to specify an amino acid or amino acidresidue, the single letter designations A, C, D, E, F, G, H, I, K, L, M,N, P, Q, R, S, T, V, W, and Y have their standard meaning as used in theart and as provided in Table 1 herein.

Some preferred GAT polypeptides of the invention can be optimallyaligned with a reference amino acid sequence selected from the groupconsisting of SEQ ID NO: 300, 445, and 457 to generate a similarityscore of at least 460 using the BLOSUM62 matrix, a gap existence penaltyof 11, and a gap extension penalty of 1, wherein at least one of thefollowing positions conforms to the following restrictions: (i) atpositions 18 and 38, there is a Z5 amino acid residue; (ii) at position62, there is a Z1 amino acid residue; (iii) at position 124, there is aZ6 amino acid residue; and (iv) at position 144, there is a Z2 aminoacid residue, wherein: Z1 is an amino acid residue selected from thegroup consisting of A, I, L, M, and V; Z2 is an amino acid residueselected from the group consisting of F, W, and Y; Z5 is an amino acidresidue selected from the group consisting of D and E; and Z6 is anamino acid residue selected from the group consisting of C, G, and P,and further wherein of the amino acid residues in the amino acidsequence that correspond to the following positions, at least 80%conform to the following restrictions: (a) at positions 2, 4, 15, 19,26, 28, 51, 54, 86, 90, 91, 97, 103, 105, 106, 114, 129, 139, and/or 145the amino acid residue is Z1; (b) at positions 31 and/or 45 the aminoacid residue is Z2; (c) at position 8 the amino acid residue is Z3; (d)at position 89 the amino acid residue is Z3 or Z6; (e) at positions 82,92, 101 and/or 120 the amino acid residue is Z4; (f) at positions 3, 11,27 and/or 79 the amino acid residue is Z5; (g) at position 18 the aminoacid residue is Z4 or Z5; (h) at position 123 the amino acid residue isZ1 or Z2; (i) at positions 12, 33, 35, 39, 53, 59, 112, 132, 135, 140,and/or 146 the amino acid residue is Z1 or Z3; 0) at position 30 theamino acid residue is Z1; (k) at position 6 the amino acid residue isZ6; (l) at position 81 the amino acid residue is Z2 or Z4; (m) atposition 113 the amino acid residue is Z3; (n) at position 138 the aminoacid residue is Z4; (o) at position 142 the amino acid residue is Z2;(p) at positions 57 and/or 126 the amino acid residue is Z3 or Z4; (q)at position 5, 17, and 61 the amino acid residue is Z4; (r) at position24 the amino acid residue is Z3; (s) at position 104 the amino acidresidue is Z5; (t) at positions 52, and/or 69 the amino acid residue isZ3; (u) at positions 14 and/or 119 the amino acid residue is Z5; (v) atpositions 10, 32, 63, and/or 83 the amino acid residue is Z5; (w) atpositions 48 and/or 80 the amino acid residue is Z6; (x) at position 40the amino acid residue is Z1 or Z2; (y) at position 96 the amino acidresidue is Z3 or Z5; (z) at position 65 the amino acid residue is Z3,Z4, or Z6; (aa) at positions 84 and/or 115 the amino acid residue is Z3;(ab) at position 93 the amino acid residue is Z4; (ac) at position 130the amino acid residue is Z2; (ad) at position 58 the amino acid residueis Z3, Z4 or Z6; (ae) at position 47 the amino acid residue is Z4 or Z6;(af) at positions 49 and/or 100 the amino acid residue is Z3 or Z4; (ag)at position 68 the amino acid residue is Z4 or Z5; (ah) at position 143the amino acid residue is Z4; (ai) at position 131 the amino acidresidue is Z5; (aj) at positions 125 and/or 128 the amino acid residueis Z5; (ak) at position 67 the amino acid residue is Z3 or Z4; (al) atposition 60 the amino acid residue is Z5; and (am) at position 37 theamino acid residue is Z4 or Z6; wherein Z1 is an amino acid selectedfrom the group consisting of A, I, L, M, and V; Z2 is an amino acidselected from the group consisting of F, W, and Y; Z3 is an amino acidselected from the group consisting of N, Q, S, and T; Z4 is an aminoacid selected from the group consisting of R, H, and K; Z5 is an aminoacid selected from the group consisting of D and E; and Z6 is an aminoacid selected from the group consisting of C, G, and P.

Some preferred GAT polypeptides of the invention further comprise theamino acid residues in the amino acid sequence that correspond to thepositions specified in (a)-(am), wherein at least 90% conform to theamino acid residue restrictions specified in (a)-(am).

Some preferred GAT polypeptides of the invention additionally compriseamino acid residues in the amino acid sequence that correspond to thefollowing positions, wherein at least 90% conform to the followingrestrictions: (a) at positions 1, 7, 9, 13, 20, 36, 42, 46, 50, 56, 64,70, 72, 75, 76, 78, 94, 98, 107, 110, 117, 118, 121, and/or 141 theamino acid residue is B1; and (b) at positions 16, 21, 22, 23, 25, 29,34, 41, 43, 44, 55, 66, 71, 73, 74, 77, 85, 87, 88, 95, 99, 102, 108,109, 111, 116, 122, 127, 133, 134, 136, and/or 137 the amino acidresidue is B2; wherein B1 is an amino acid selected from the groupconsisting of A, I, L, M, F, W, Y, and V; and B2 is an amino acidselected from the group consisting of R, N, D, C, Q, E, G, H, K, P, S,and T.

Some preferred GAT polypeptides of the invention additionally compriseamino acid residues in the amino acid sequence that correspond to thefollowing positions, wherein at least 90% conform to the followingrestrictions: (a) at positions 1, 7, 9, 13, 20, 42, 46, 50, 56, 64, 70,72, 75, 76, 78, 94, 98, 107, 110, 117, 118, 121, and/or 141 the aminoacid residue is B1; and (b) at positions 16, 21, 22, 23, 25, 29, 34, 36,41, 43, 44, 55, 66, 71, 73, 74, 77, 85, 87, 88, 95, 99, 102, 108, 109,111, 116, 122, 127, 133, 134, 136, and/or 137 the amino acid residue isB2; wherein B1 is an amino acid selected from the group consisting of A,I, L, M, F, W, Y, and V; and B2 is an amino acid selected from the groupconsisting of R, N, D, C, Q, E, G, H, K, P, S, and T.

Some preferred GAT polypeptides of the invention additionally compriseamino acid residues in the amino acid sequence that correspond to thefollowing positions, wherein at least 90% conform to the followingrestrictions: (a) at positions 1, 7, 9, 20, 42, 50, 72, 75, 76, 78, 94,98, 110, 121, and/or 141 the amino acid residue is Z1; (b) at positions13, 46, 56, 70, 107, 117, and/or 118 the amino acid residue is Z2; (c)at positions 23, 55, 71, 77, 88, and/or 109 the amino acid residue isZ3; (d) at positions 16, 21, 41, 73, 85, 99, and/or 111 the amino acidresidue is Z4; (e) at positions 34 and/or 95 the amino acid residue isZ5; (f) at position 22, 25, 29, 43, 44, 66, 74, 87, 102, 108, 116, 122,127, 133, 134, 136, and/or 137 the amino acid residue is Z6; wherein Z1is an amino acid selected from the group consisting of A, I, L, M, andV; Z2 is an amino acid selected from the group consisting of F, W, andY; Z3 is an amino acid selected from the group consisting of N, Q, S,and T; Z4 is an amino acid selected from the group consisting of R, H,and K; Z5 is an amino acid selected from the group consisting of D andE; and Z6 is an amino acid selected from the group consisting of C, G,and P.

Some preferred GAT polypeptides of the invention further comprise anamino acid residue at position 36 which is selected from the groupconsisting of Z1 and Z3. Some preferred GAT polypeptides of theinvention further comprise an amino acid residue at position 64 which isselected from the group consisting of Z1 and Z2.

Some preferred GAT polypeptides of the invention further comprise aminoacid residues in the amino acid sequence that correspond to thefollowing positions, wherein at least 80% conform to the followingrestrictions: (a) at position 2 the amino acid residue is I or L; (b) atposition 3 the amino acid residue is E; (c) at position 4 the amino acidresidue is V or I; (d) at position 5 the amino acid residue is K; (e) atposition 6 the amino acid residue is P; (f) at position 8 the amino acidresidue is N; (g) at position 10 the amino acid residue is E; (h) atposition 11 the amino acid residue is D or E; (i) at position 12 theamino acid residue is T; (j) at position 14 the amino acid residue is Eor D; (k) at position 15 the amino acid residue is L; (l) at position 17the amino acid residue is H; (m) at position 18 the amino acid residueis R, E or K; (n) at position 19 the amino acid residue is I or V; (o)at position 24 the amino acid residue is Q; (p) at position 26 the aminoacid residue is M, L, V or I; (q) at position 27 the amino acid residueis E; (r) at position 28 the amino acid residue is A or V; (s) atposition 30 the amino acid residue is M; (t) at position 31 the aminoacid residue is Y or F; (u) at position 32 the amino acid residue is Eor D; (v) at position 33 the amino acid residue is T or S; (w) atposition 35 the amino acid residue is L; (x) at position 37 the aminoacid residue is R, G, E or Q; (y) at position 39 the amino acid residueis A or S; (z) at position 40 the amino acid residue is F or L; (aa) atposition 45 the amino acid residue is Y or F; (ab) at position 47 theamino acid residue is R or G; (ac) at position 48 the amino acid residueis G; (ad) at position 49 the amino acid residue is K, R, or Q; (ae) atposition 51 the amino acid residue is I or V; (af) at position 52 theamino acid residue is S; (ag) at position 53 the amino acid residue is Ior V; (ah) at position 54 the amino acid residue is A; (ai) at position57 the amino acid residue is H or N; (aj) at position 58 the amino acidresidue is Q, K, R or P; (ak) at position 59 the amino acid residue isA; (al) at position 60 the amino acid residue is E; (am) at position 61the amino acid residue is H or R; (an) at position 63 the amino acidresidue is E or D; (ao) at position 65 the amino acid residue is E, P orQ; (ap) at position 67 the amino acid residue is Q or R; (aq) atposition 68 the amino acid residue is K or E; (ar) at position 69 theamino acid residue is Q; (as) at position 79 the amino acid residue isE; (at) at position 80 the amino acid residue is G; (au) at position 81the amino acid residue is Y, H or F; (av) at position 82 the amino acidresidue is R; (aw) at position 83 the amino acid residue is E or D; (ax)at position 84 the amino acid residue is Q; (ay) at position 86 theamino acid residue is A; (az) at position 89 the amino acid residue isG, T or S; (ba) at position 90 the amino acid residue is L; (bb) atposition 91 the amino acid residue is L, I or V; (bc) at position 92 theamino acid residue is R or K; (bd) at position 93 the amino acid residueis H; (be) at position 96 the amino acid residue is E or Q; (bf) atposition 97 the amino acid residue is I; (bg) at position 100 the aminoacid residue is K or N; (bh) at position 101 the amino acid residue is Kor R; (bi) at position 103 the amino acid residue is A or V; (bj) atposition 104 the amino acid residue is D; (bk) at position 105 the aminoacid residue is M, L or I; (bl) at position 106 the amino acid residueis L; (bm) at position 112 the amino acid residue is T or A; (bn) atposition 113 the amino acid residue is S or T; (bo) at position 114 theamino acid residue is A; (bp) at position 115 the amino acid residue isS; (bq) at position 119 the amino acid residue is K or R; (br) atposition 120 the amino acid residue is K or R; (bs) at position 123 theamino acid residue is F or L; (bt) at position 125 the amino acidresidue is E; (bu) at position 126 the amino acid residue is Q or H;(bv) at position 128 the amino acid residue is E or D; (bw) at position129 the amino acid residue is V or I; (bx) at position 130 the aminoacid residue is F; (by) at position 131 the amino acid residue is D orE; (bx) at position 132 the amino acid residue is T; (ca) at position135 the amino acid residue is V; (cb) at position 138 the amino acidresidue is H; (cc) at position 139 the amino acid residue is I; (cd) atposition 140 the amino acid residue is L or M; (ce) at position 142 theamino acid residue is Y; (cf) at position 143 the amino acid residue isK or R; (cg) at position 145 the amino acid residue is L or I; and (ch)at position 146 the amino acid residue is T.

Some preferred GAT polypeptides of the invention further comprise aminoacid residues in the amino acid sequence that correspond to thepositions specified in (a)-(ch) above, wherein at least 90% conform tothe amino acid residue restrictions specified in (a)-(ch).

Some preferred GAT polypeptides of the invention can be optimallyaligned with a reference amino acid sequence selected from the groupconsisting of SEQ ID NO: 300, 445, and 457 to generate a similarityscore of at least 460 using the BLOSUM62 matrix, a gap existence penaltyof 11, and a gap extension penalty of 1, wherein at least one of thefollowing positions conforms to the following restrictions: (i) atpositions 18 and 38, there is a Z5 amino acid residue; (ii) at position62, there is a Z1 amino acid residue; (iii) at position 124, there is aZ6 amino acid residue; and (iv) at position 144, there is a Z2 aminoacid residue; wherein: Z1 is an amino acid residue selected from thegroup consisting of A, I, L, M, and V; Z2 is an amino acid residueselected from the group consisting of F, W, and Y; Z5 is an amino acidresidue selected from the group consisting of D and E; and Z6 is anamino acid residue selected from the group consisting of C, G, and P,and further wherein of the amino acid residues in the amino acidsequence that correspond to the following positions, at least 80%conform to the following restrictions: (a) at positions 9, 76, 94 and110 the amino acid residue is A; (b) at positions 29 and 108 the aminoacid residue is C; (c) at position 34 the amino acid residue is D; (d)at position 95 the amino acid residue is E; (e) at position 56 the aminoacid residue is F; (f) at positions 43, 44, 66, 74, 87, 102, 116, 122,127 and 136 the amino acid residue is G; (g) at position 41 the aminoacid residue is H; (h) at position 7 the amino acid residue is I; (i) atposition 85 the amino acid residue is K; (j) at positions 20, 42, 50, 78and 121 the amino acid residue is L; (k) at positions 1 and 141 theamino acid residue is M; (l) at positions 23 and 109 the amino acidresidue is N; (m) at positions 22, 25, 133, 134 and 137 the amino acidresidue is P; (n) at position 71 the amino acid residue is Q; (o) atpositions 16, 21, 73, 99 and 111 the amino acid residue is R; (p) atposition 55 the amino acid residue is S; (q) at position 77 the aminoacid residue is T; (r) at position 107 the amino acid residue is W; and(s) at position 13, 46, 70 and 118 the amino acid residue is Y.

Some preferred GAT polypeptides of the invention further comprise aminoacid sequences wherein the amino acid residues meet at least one of thefollowing restrictions: (a) at position 36 the amino acid residue is M,L, or T; (b) at position 72 the amino acid residue is L or I; (c) atposition 75 the amino acid residue is M or V; (d) at position 64 theamino acid residue is L, I, or F; (e) at position 88 the amino acidresidue is T or S; and (f) at position 117 the amino acid residue is Yor F.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence wherein the amino acid residues meet at least one of thefollowing additional restrictions: (a) at position 14 the amino acidresidue is D; (b) at position 18 the amino acid residue is E; (c) atposition 26 the amino acid residue is M or V; (e) at position 30 theamino acid residue is I; (f) at position 32 the amino acid residue is D;(g) at position 36 the amino acid residue is M or T; (i) at position 37the amino acid residue is C; (j) at position 38 the amino acid residueis D; (j) at position 53 the amino acid residue is V; (k) at position 58the amino acid residue is R; (l) at position 61 the amino acid residueis R; (m) at position 62 the amino acid residue is L; (n) at position 64the amino acid residue is I or F; (o) at position 65 the amino acidresidue is P; (p) at position 72 the amino acid residue is I; (q) atposition 75 the amino acid residue is V; (r) at position 88 the aminoacid residue is T; (s) at position 89 the amino acid residue is G; (t)at position 91 the amino acid residue is L; (u) at position 98 the aminoacid residue is I; (v) at position 105 the amino acid residue I; (w) atposition 112 the amino acid residue is A; (x) at position 124 the aminoacid residue is G or C; (y) at position 128 the amino acid residue is D;(z) at position 140 the amino acid residue is M; (aa) at position 143the amino acid residue is R; and (ab) at position 144 the amino acidresidue is W.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence wherein of the amino acid residues that correspond to thepositions specified in (a) through (ab) as described above, at least 80%conform to the amino acid residue restrictions specified in (a) through(ab).

Some preferred GAT polypeptides of the invention have an amino acidsequence that comprises amino acid residues at least one of which meetsthe following additional restrictions: (a) at position 41 the amino acidresidue is H; (b) at position 138 the amino acid residue is H; (c) atposition 34 the amino acid residue is N; and (d) at position 55 theamino acid residue is S.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence selected from the group consisting of: (a) an amino acidsequence that is at least 98% identical to SEQ ID NO:577; (b) an aminoacid sequence that is at least 97% identical to SEQ ID NO:578; (c) anamino acid sequence that is at least 97% identical to SEQ ID NO:621; (d)an amino acid sequence that is at least 98% identical to SEQ ID NO:579;(e) an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) an amino acid sequence that is at least 95% identical to SEQID NO:697; (g) an amino acid sequence that is at least 96% identical toSEQ ID NO:721; (h) an amino acid sequence that is at least 97% identicalto SEQ ID NO:613; (i) an amino acid sequence that is at least 89%identical to SEQ ID NO:677; (j) an amino acid sequence that is at least96% identical to SEQ ID NO:584; (k) an amino acid sequence that is atleast 98% identical to SEQ ID NO:707; (l) an amino acid sequence that isat least 98% identical to SEQ ID NO:616; (m) an amino acid sequence thatis at least 96% identical to SEQ ID NO:612; and (n) an amino acidsequence that is at least 98% identical to SEQ ID NO:590.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence selected from the group consisting of: (a) an amino acidsequence that is at least 98% identical to SEQ ID NO:577; (b) an aminoacid sequence that is at least 97% identical to SEQ ID NO:578; (c) anamino acid sequence that is at least 97% identical to SEQ ID NO:621; (d)an amino acid sequence that is at least 98% identical to SEQ ID NO:579;(e) an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) an amino acid sequence that is at least 95% identical to SEQID NO:697; (g) an amino acid sequence that is at least 96% identical toSEQ ID NO:721; (h) an amino acid sequence that is at least 97% identicalto SEQ ID NO:613; (i) an amino acid sequence that is at least 89%identical to SEQ ID NO:677; (j) an amino acid sequence that is at least96% identical to SEQ ID NO:584; (k) an amino acid sequence that is atleast 98% identical to SEQ ID NO:707; (l) an amino acid sequence that isat least 98% identical to SEQ ID NO:616; (m) an amino acid sequence thatis at least 96% identical to SEQ ID NO:612; and (n) an amino acidsequence that is at least 98% identical to SEQ ID NO:590, wherein atleast one of the following positions further conforms to the followingrestrictions: (i) at positions 18 and 38, there is a Z5 amino acidresidue; (ii) at position 62, there is a Z1 amino acid residue; (iii) atposition 124, there is a Z6 amino acid residue; and (iv) at position144, there is a Z2 amino acid residue, wherein: Z1 is an amino acidresidue selected from the group consisting of A, I, L, M, and V; Z2 isan amino acid residue selected from the group consisting of F, W, and Y;Z5 is an amino acid residue selected from the group consisting of D andE; and Z6 is an amino acid residue selected from the group consisting ofC, G, and P.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence selected from the group consisting of: (a) an amino acidsequence that is at least 98% identical to SEQ ID NO:577; (b) an aminoacid sequence that is at least 97% identical to SEQ ID NO:578; (c) anamino acid sequence that is at least 97% identical to SEQ ID NO:621; (d)an amino acid sequence that is at least 98% identical to SEQ ID NO:579;(e) an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) an amino acid sequence that is at least 95% identical to SEQID NO:697; (g) an amino acid sequence that is at least 96% identical toSEQ ID NO:721; (h) an amino acid sequence that is at least 97% identicalto SEQ ID NO:613; (i) an amino acid sequence that is at least 89%identical to SEQ ID NO:677; (j) an amino acid sequence that is at least96% identical to SEQ ID NO:584; (k) an amino acid sequence that is atleast 98% identical to SEQ ID NO:707; (l) an amino acid sequence that isat least 98% identical to SEQ ID NO:616; (m) an amino acid sequence thatis at least 96% identical to SEQ ID NO:612; and (n) an amino acidsequence that is at least 98% identical to SEQ ID NO:590, wherein of theamino acid residues in the amino acid sequence that correspond to thefollowing positions, at least 90% conform to the following additionalrestrictions: (a) at positions 2, 4, 15, 19, 26, 28, 31, 45, 51, 54, 86,90, 91, 97, 103, 105, 106, 114, 123, 129, 139, and/or 145 the amino acidresidue is B1; and (b) at positions 3, 5, 8, 10, 11, 14, 17, 24, 27, 32,37, 47, 48, 49, 52, 57, 58, 61, 63, 68, 69, 79, 80, 82, 83, 89, 92, 100,101, 104, 119, 120, 125, 126, 128, 131, and/or 143 the amino acidresidue is B2; wherein B1 is an amino acid selected from the groupconsisting of A, I, L, M, F, W, Y, and V; and B2 is an amino acidselected from the group consisting of R, N, D, C, Q, E, G, H, K, P, S,and T.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence selected from the group consisting of: (a) an amino acidsequence that is at least 98% identical to SEQ ID NO:577; (b) an aminoacid sequence that is at least 97% identical to SEQ ID NO:578; (c) anamino acid sequence that is at least 97% identical to SEQ ID NO:621; (d)an amino acid sequence that is at least 98% identical to SEQ ID NO:579;(e) an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) an amino acid sequence that is at least 95% identical to SEQID NO:697; (g) an amino acid sequence that is at least 96% identical toSEQ ID NO:721; (h) an amino acid sequence that is at least 97% identicalto SEQ ID NO:613; (i) an amino acid sequence that is at least 89%identical to SEQ ID NO:677; (j) an amino acid sequence that is at least96% identical to SEQ ID NO:584; (k) an amino acid sequence that is atleast 98% identical to SEQ ID NO:707; (l) an amino acid sequence that isat least 98% identical to SEQ ID NO:616; (m) an amino acid sequence thatis at least 96% identical to SEQ ID NO:612; and (n) an amino acidsequence that is at least 98% identical to SEQ ID NO:590, wherein of theamino acid residues in the amino acid sequence that correspond to thefollowing positions, at least 80% conform to the following additionalrestrictions: (a) at positions 2, 4, 15, 19, 26, 28, 51, 54, 86, 90, 91,97, 103, 105, 106, 114, 129, 139, and/or 145 the amino acid residue isZ1; (b) at positions 31 and/or 45 the amino acid residue is Z2; (c) atposition 8 the amino acid residue is Z3; (d) at position 89 the aminoacid residue is Z3 or Z6; (e) at positions 82, 92, 101 and/or 120 theamino acid residue is Z4; (f) at positions 3, 11, 27 and/or 79 the aminoacid residue is Z5; (g) at position 18 the amino acid residue is Z4 orZ5; (h) at position 123 the amino acid residue is Z1 or Z2; (i) atpositions 12, 33, 35, 39, 53, 59, 112, 132, 135, 140, and/or 146 theamino acid residue is Z1 or Z3; (j) at position 30 the amino acidresidue is Z1; (k) at position 6 the amino acid residue is Z6; (l) atposition 81 the amino acid residue is Z2 or Z4; (m) at position 113 theamino acid residue is Z3; (n) at position 138 the amino acid residue isZ4; (o) at position 142 the amino acid residue is Z2; (p) at positions57 and/or 126 the amino acid residue is Z3 or Z4; (q) at position 5, 17,and 61 the amino acid residue is Z4; (r) at position 24 the amino acidresidue is Z3; (s) at position 104 the amino acid residue is Z5; (t) atpositions 52, and/or 69 the amino acid residue is Z3; (u) at positions14 and/or 119 the amino acid residue is Z5; (v) at positions 10, 32, 63,and/or 83 the amino acid residue is Z5; (w) at positions 48 and/or 80the amino acid residue is Z6; (x) at position 40 the amino acid residueis Z1 or Z2; (y) at position 96 the amino acid residue is Z3 or Z5; (z)at position 65 the amino acid residue is Z3, Z4, or Z6; (aa) atpositions 84 and/or 115 the amino acid residue is Z3; (ab) at position93 the amino acid residue is Z4; (ac) at position 130 the amino acidresidue is Z2; (ad) at position 58 the amino acid residue is Z3, Z4 orZ6; (ae) at position 47 the amino acid residue is Z4 or Z6; (af) atpositions 49 and/or 100 the amino acid residue is Z3 or Z4; (ag) atposition 68 the amino acid residue is Z4 or Z5; (ah) at position 143 theamino acid residue is Z4; (ai) at position 131 the amino acid residue isZ5; (aj) at positions 125 and/or 128 the amino acid residue is Z5; (ak)at position 67 the amino acid residue is Z3 or Z4; (al) at position 60the amino acid residue is Z5; and (am) at position 37 the amino acidresidue is Z4 or Z6; wherein Z1 is an amino acid selected from the groupconsisting of A, I, L, M, and V; Z2 is an amino acid selected from thegroup consisting of F, W, and Y; Z3 is an amino acid selected from thegroup consisting of N, Q, S, and T; Z4 is an amino acid selected fromthe group consisting of R, H, and K; Z5 is an amino acid selected fromthe group consisting of D and E; and Z6 is an amino acid selected fromthe group consisting of C, G, and P.

Some preferred GAT polypeptides of the invention further comprise aminoacid residues in the amino acid sequence that correspond to thepositions specified in (a)-(am), wherein at least 90% conform to theamino acid residue restrictions specified in (a)-(am).

Some preferred GAT polypeptides of the invention comprise amino acidresidues in the amino acid sequence that correspond to the followingpositions wherein at least 90% conform to the following additionalrestrictions: (a) at positions 1, 7, 9, 13, 20, 36, 42, 46, 50, 56, 64,70, 72, 75, 76, 78, 94, 98, 107, 110, 117, 118, 121, and/or 141 theamino acid residue is B1; and (b) at positions 16, 21, 22, 23, 25, 29,34, 41, 43, 44, 55, 66, 71, 73, 74, 77, 85, 87, 88, 95, 99, 102, 108,109, 111, 116, 122, 127, 133, 134, 136, and/or 137 the amino acidresidue is B2; wherein B1 is an amino acid selected from the groupconsisting of A, I, L, M, F, W, Y, and V; and B2 is an amino acidselected from the group consisting of R, N, D, C, Q, E, G, H, K, P, S,and T.

Some preferred GAT polypeptides of the invention comprise amino acidresidues in the amino acid sequence that correspond to the followingpositions wherein at least 90% conform to the following additionalrestrictions: (a) at positions 1, 7, 9, 13, 20, 42, 46, 50, 56, 64, 70,72, 75, 76, 78, 94, 98, 107, 110, 117, 118, 121, and/or 141 the aminoacid residue is B1; and (b) at positions 16, 21, 22, 23, 25, 29, 34, 36,41, 43, 44, 55, 66, 71, 73, 74, 77, 85, 87, 88, 95, 99, 102, 108, 109,111, 116, 122, 127, 133, 134, 136, and/or 137 the amino acid residue isB2; wherein B1 is an amino acid selected from the group consisting of A,I, L, M, F, W, Y, and V; and B2 is an amino acid selected from the groupconsisting of R, N, D, C, Q, E, G, H, K, P, S, and T.

Some preferred GAT polypeptides of the invention comprise amino acidresidues in the amino acid sequence that correspond to the followingpositions wherein at least 90% conform to the following additionalrestrictions: (a) at positions 1, 7, 9, 20, 42, 50, 72, 75, 76, 78, 94,98, 110, 121, and/or 141 the amino acid residue is Z1; (b) at positions13, 46, 56, 70, 107, 117, and/or 118 the amino acid residue is Z2; (c)at positions 23, 55, 71, 77, 88, and/or 109 the amino acid residue isZ3; (d) at positions 16, 21, 41, 73, 85, 99, and/or 111 the amino acidresidue is Z4; (e) at positions 34 and/or 95 the amino acid residue isZ5; (f) at position 22, 25, 29, 43, 44, 66, 74, 87, 102, 108, 116, 122,127, 133, 134, 136, and/or 137 the amino acid residue is Z6; wherein Z1is an amino acid selected from the group consisting of A, I, L, M, andV; Z2 is an amino acid selected from the group consisting of F, W, andY; Z3 is an amino acid selected from the group consisting of N, Q, S,and T; Z4 is an amino acid selected from the group consisting of R, H,and K; Z5 is an amino acid selected from the group consisting of D andE; and Z6 is an amino acid selected from the group consisting of C, G,and P.

Some preferred GAT polypeptides of the invention further comprise anamino acid sequence wherein the amino acid residue at position 36 isselected from the group consisting of Z1 and Z3. Some preferred GATpolypeptides of the invention further comprise an amino acid sequencewherein the amino acid residue at position 64 is selected from the groupconsisting of Z1 and Z2.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence wherein of the amino acid residues that correspond to thefollowing positions, at least 80% conform to the following additionalrestrictions: (a) at position 2 the amino acid residue is I or L; (b) atposition 3 the amino acid residue is E; (c) at position 4 the amino acidresidue is V or I; (d) at position 5 the amino acid residue is K; (e) atposition 6 the amino acid residue is P; (f) at position 8 the amino acidresidue is N; (g) at position 10 the amino acid residue is E; (h) atposition 11 the amino acid residue is D or E; (i) at position 12 theamino acid residue is T; (j) at position 14 the amino acid residue is Eor D; (k) at position 15 the amino acid residue is L; (l) at position 17the amino acid residue is H; (m) at position 18 the amino acid residueis R, E or K; (n) at position 19 the amino acid residue is I or V; (o)at position 24 the amino acid residue is Q; (p) at position 26 the aminoacid residue is M, L, V or I; (q) at position 27 the amino acid residueis E; (r) at position 28 the amino acid residue is A or V; (s) atposition 30 the amino acid residue is M; (t) at position 31 the aminoacid residue is Y or F; (u) at position 32 the amino acid residue is Eor D; (v) at position 33 the amino acid residue is T or S; (w) atposition 35 the amino acid residue is L; (x) at position 37 the aminoacid residue is R, G, E or Q; (y) at position 39 the amino acid residueis A or S; (z) at position 40 the amino acid residue is F or L; (aa) atposition 45 the amino acid residue is Y or F; (ab) at position 47 theamino acid residue is R or G; (ac) at position 48 the amino acid residueis G; (ad) at position 49 the amino acid residue is K, R, or Q; (ae) atposition 51 the amino acid residue is I or V; (af) at position 52 theamino acid residue is S; (ag) at position 53 the amino acid residue is Ior V; (ah) at position 54 the amino acid residue is A; (ai) at position57 the amino acid residue is H or N; (aj) at position 58 the amino acidresidue is Q, K, R or P; (ak) at position 59 the amino acid residue isA; (al) at position 60 the amino acid residue is E; (am) at position 61the amino acid residue is H or R; (an) at position 63 the amino acidresidue is E or D; (ao) at position 65 the amino acid residue is E, P orQ; (ap) at position 67 the amino acid residue is Q or R; (aq) atposition 68 the amino acid residue is K or E; (ar) at position 69 theamino acid residue is Q; (as) at position 79 the amino acid residue isE; (at) at position 80 the amino acid residue is G; (au) at position 81the amino acid residue is Y, H or F; (av) at position 82 the amino acidresidue is R; (aw) at position 83 the amino acid residue is E or D; (ax)at position 84 the amino acid residue is Q; (ay) at position 86 theamino acid residue is A; (az) at position 89 the amino acid residue isG, T or S; (ba) at position 90 the amino acid residue is L; (bb) atposition 91 the amino acid residue is L, I or V; (bc) at position 92 theamino acid residue is R or K; (bd) at position 93 the amino acid residueis H; (be) at position 96 the amino acid residue is E or Q; (bf) atposition 97 the amino acid residue is I; (bg) at position 100 the aminoacid residue is K or N; (bh) at position 101 the amino acid residue is Kor R; (bi) at position 103 the amino acid residue is A or V; (bj) atposition 104 the amino acid residue is D; (bk) at position 105 the aminoacid residue is M, L or I; (bl) at position 106 the amino acid residueis L; (bm) at position 112 the amino acid residue is T or A; (bn) atposition 113 the amino acid residue is S or T; (bo) at position 114 theamino acid residue is A; (bp) at position 115 the amino acid residue isS; (bq) at position 119 the amino acid residue is K or R; (br) atposition 120 the amino acid residue is K or R; (bs) at position 123 theamino acid residue is F or L; (bt) at position 125 the amino acidresidue is E; (bu) at position 126 the amino acid residue is Q or H;(bv) at position 128 the amino acid residue is E or D; (bw) at position129 the amino acid residue is V or I; (bx) at position 130 the aminoacid residue is F; (by) at position 131 the amino acid residue is D orE; (bx) at position 132 the amino acid residue is T; (ca) at position135 the amino acid residue is V; (cb) at position 138 the amino acidresidue is H; (cc) at position 139 the amino acid residue is I; (cd) atposition 140 the amino acid residue is L or M; (ce) at position 142 theamino acid residue is Y; (cf) at position 143 the amino acid residue isK or R; (cg) at position 145 the amino acid residue is L or I; and (ch)at position 146 the amino acid residue is T.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence in which of the residues that correspond to the positionsspecified in (a)-(ch) above, at least 90% conform to the amino acidresidue restrictions specified in (a)-(ch).

Some preferred GAT polypeptides of the invention comprise an amino acidsequence selected from the group consisting of: (a) an amino acidsequence that is at least 98% identical to SEQ ID NO:577; (b) an aminoacid sequence that is at least 97% identical to SEQ ID NO:578; (c) anamino acid sequence that is at least 97% identical to SEQ ID NO:621; (d)an amino acid sequence that is at least 98% identical to SEQ ID NO:579;(e) an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) an amino acid sequence that is at least 95% identical to SEQID NO:697; (g) an amino acid sequence that is at least 96% identical toSEQ ID NO:721; (h) an amino acid sequence that is at least 97% identicalto SEQ ID NO:613; (i) an amino acid sequence that is at least 89%identical to SEQ ID NO:677; (j) an amino acid sequence that is at least96% identical to SEQ ID NO:584; (k) an amino acid sequence that is atleast 98% identical to SEQ ID NO:707; (l) an amino acid sequence that isat least 98% identical to SEQ ID NO:616; (m) an amino acid sequence thatis at least 96% identical to SEQ ID NO:612; and (n) an amino acidsequence that is at least 98% identical to SEQ ID NO:590, furtherwherein of the amino acid residues in the amino acid sequence thatcorrespond to the following positions, at least 80% conform to thefollowing restrictions: (a) at positions 9, 76, 94 and 110 the aminoacid residue is A; (b) at positions 29 and 108 the amino acid residue isC; (c) at position 34 the amino acid residue is D; (d) at position 95the amino acid residue is E; (e) at position 56 the amino acid residueis F; (f) at positions 43, 44, 66, 74, 87, 102, 116, 122, 127 and 136the amino acid residue is G; (g) at position 41 the amino acid residueis H; (h) at position 7 the amino acid residue is I; (i) at position 85the amino acid residue is K; (j) at positions 20, 42, 50, 78 and 121 theamino acid residue is L; (k) at positions 1 and 141 the amino acidresidue is M; (l) at positions 23 and 109 the amino acid residue is N;(m) at positions 22, 25, 133, 134 and 137 the amino acid residue is P;(n) at position 71 the amino acid residue is Q; (o) at positions 16, 21,73, 99 and 111 the amino acid residue is R; (p) at position 55 the aminoacid residue is S; (q) at position 77 the amino acid residue is T; (r)at position 107 the amino acid residue is W; and (s) at position 13, 46,70 and 118 the amino acid residue is Y.

Some preferred GAT polypeptides of the invention further comprise anamino acid sequence in which at least one of the following criteria ismet: (a) at position 14 the amino acid residue is D; (b) at position 18the amino acid residue is E; (c) at position 26 the amino acid residueis M or V; (e) at position 30 the amino acid residue is I; (f) atposition 32 the amino acid residue is D; (g) at position 36 the aminoacid residue is M or T; (i) at position 37 the amino acid residue is C;(j) at position 38 the amino acid residue is D; (j) at position 53 theamino acid residue is V; (k) at position 58 the amino acid residue is R;(l) at position 61 the amino acid residue is R; (m) at position 62 theamino acid residue is L; (n) at position 64 the amino acid residue is Ior F; (o) at position 65 the amino acid residue is P; (p) at position 72the amino acid residue is I; (q) at position 75 the amino acid residueis V; (r) at position 88 the amino acid residue is T; (s) at position 89the amino acid residue is G; (t) at position 91 the amino acid residueis L; (u) at position 98 the amino acid residue is I; (v) at position105 the amino acid residue I; (w) at position 112 the amino acid residueis A; (x) at position 124 the amino acid residue is G or C; (y) atposition 128 the amino acid residue is D; (z) at position 140 the aminoacid residue is M; (aa) at position 143 the amino acid residue is R; and(ab) at position 144 the amino acid residue is W.

Some preferred GAT polypeptides of the invention further comprise anamino acid sequence wherein of the amino acid residues that correspondto the positions specified in (a) through (ab) as described above, atleast 80% conform to the amino acid residue restrictions specified in(a) through (ab).

Some preferred GAT polypeptides of the invention further comprise anamino acid sequence wherein the following conditions are also met: (a)at position 41 the amino acid residue is H; (b) at position 138 theamino acid residue is H; (c) at position 34 the amino acid residue is N;and (d) at position 55 the amino acid residue is S.

Some preferred GAT polypeptides of the invention when optimally alignedwith a reference amino acid sequence selected from the group consistingof SEQ ID NO: 300, 445, and 457 to generate a similarity score of atleast 460 using the BLOSUM62 matrix, a gap existence penalty of 11, anda gap extension penalty of 1, have amino acid sequences such that one ormore of the following positions conform to the following restrictions:(i) at positions 18 and 38, there is a Z5 amino acid residue; (ii) atposition 62, there is a Z1 amino acid residue; (iii) at position 124,there is a Z6 amino acid residue; and (iv) at position 144, there is aZ2 amino acid residue, wherein: Z1 is an amino acid residue selectedfrom the group consisting of A, I, L, M, and V; Z2 is an amino acidresidue selected from the group consisting of F, W, and Y; Z5 is anamino acid residue selected from the group consisting of D and E; and Z6is an amino acid residue selected from the group consisting of C, G, andP. In certain of the aforementioned GAT polypeptides, the amino acidresidue in the polypeptide corresponding to position 28 is V, I or A.Valine or isoleucine at position 28 generally correlates with reducedK_(M), while alanine at that position generally correlates withincreased k_(cat). Threonine at position 89 and arginine at position 58generally correlates with reduced K_(M). Other preferred GATpolypeptides are characterized by having 127 (i.e., an I at position27), M30, D34, S35, R37, S39, H41, G48, K49, N57, Q58, P62, T62, Q65,Q67, K68, V75, E83, S89, A96, E96, R101, T112, A114, K119, K120, E128,V129, D131, T131, V132, V134, V135, H138, R144, 1145, or T146, or anycombination thereof.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence selected from the group consisting of SEQ ID NO: 568, 569, 570,571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584,585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598,599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612,613, 614, 615, 616, 617, 618, 619, 621, 623, 625, 627, 629, 631, 633,635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657, 659, 661,663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689,691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717,719, 721, 723, 725, 727, 729, 731, 733, 735, 737, 739, 741, 743, 745,747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773,775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801,803, 805, 807, 809, 811, 813, 815, 817, 819, 821, 823 and 825.

In another aspect, the invention provides an isolated or recombinantpolypeptide that comprises at least 20, or alternatively, at least 50,at least 75, at least 100, at least 125, at least 130, at least 135, atleast 140, at least 141, at least 142, at least 143, at least 144 or atleast 145 contiguous amino acids of an amino acid sequence selected fromthe groups consisting of: (a) an amino acid sequence that is at least96% identical to SEQ ID NO:919 (such as, for example, SEQ ID NO:917,919, 921, 923, 925, 927, 833, 835, 839, 843, 845, 859, 863, 873, 877,891, 895, 901, 905, 907, 913, 915, or 950); (b) an amino acid sequencethat is at least 97% identical to SEQ ID NO:929 (such as, for example,SEQ ID NO:929, 931, 835, 843, 849, or 867); (c) an amino acid sequencethat is at least 98% identical to SEQ ID NO:847 (such as, for example,SEQ ID NO:845 or 847); (d) an amino acid sequence that is at least 98%identical to SEQ ID NO: δ 1; (e) an amino acid sequence that is at least98% identical to SEQ ID NO:853; (f) an amino acid sequence that is atleast 98% identical to SEQ ID NO:855 (such as, for example, SEQ IDNO:835 or 855); (g) an amino acid sequence that is at least 98%identical to SEQ ID NO:857; (h) an amino acid sequence that is at least98% identical to SEQ ID NO:861 (such as, for example, SEQ ID NO:839,861, or 883); (i) an amino acid sequence that is at least 98% identicalto SEQ ID NO:871; 0) an amino acid sequence that is at least 98%identical to SEQ ID NO:875; (k) an amino acid sequence that is at least98% identical to SEQ ID NO:881; (l) an amino acid sequence that is atleast 98% identical to SEQ ID NO:885 (such as, for example, SEQ IDNO:845 or 885); (m) an amino acid sequence that is at least 98%identical to SEQ ID NO:887; (n) an amino acid sequence that is at least98% identical to SEQ ID NO:889 (such as, for example, SEQ ID NO: 863,889, 891, or 903); (o) an amino acid sequence that is at least 98%identical to SEQ ID NO:893; (p) an amino acid sequence that is at least98% identical to SEQ ID NO:897; (q) an amino acid sequence that is atleast 98% identical to SEQ ID NO:899; (r) an amino acid sequence that isat least 98% identical to SEQ ID NO:909 (such as, for example, SEQ IDNO:883 or 909); (s) an amino acid sequence that is at least 98%identical to SEQ ID NO:911; (t) an amino acid sequence that is at least99% identical to SEQ ID NO:837; (u) an amino acid sequence that is atleast 99% identical to SEQ ID NO: 841; (v) an amino acid sequence thatis at least 99% identical to SEQ ID NO:865; (w) an amino acid sequencethat is at least 99% identical to SEQ ID NO:869; and (x) an amino acidsequence that is at least 99% identical to SEQ ID NO:879.

In another aspect, the invention provides an isolated or recombinantpolypeptide that comprises at least 20, or alternatively, at least 50,at least 75, at least 100, at least 125, at least 130, at least 135, atleast 140, at least 141, at least 142, at least 143, at least 144 or atleast 145 contiguous amino acids of an amino acid sequence that is atleast 95% identical to SEQ ID NO:929 and which comprises a Gly or an Asnresidue at the amino acid position corresponding to position 33 of SEQID NO:929 (such as, for example, SEQ ID NO:837, 849, 893, 897, 905, 921,927, 929 or 931).

In another aspect, the invention provides a polypeptide comprisingresidues 2-146 of an amino acid sequence selected from the groupconsisting of SEQ ID NO: 833, 835, 837, 839, 841, 843, 845, 847, 849,851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877,879, 881, 883, 885, 887, 889, 891, 893, 895, 897, 899, 901, 903, 905,907, 909, 911, 913, 915, 917, 919, 921, 923, 925, 927, 929, 931, 953,954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967,968, 969, 970, 971, and 972. In some embodiments of the invention, theamino acid sequence of the polypeptide comprises Met, Met-Ala, orMet-Ala-Ala on the N-terminal side of the amino acid corresponding toposition 2 of the reference amino acid sequence.

Some preferred GAT polypeptides of the invention comprise an amino acidsequence selected from the group consisting of SEQ ID NO: 833, 835, 837,839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865,867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893,895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915, 917, 919, 921,923, 925, 927, 929, 931, 946, 948, and 950.

The invention further provides preferred GAT polypeptides that arecharacterized by a combination of the foregoing amino acid residueposition restrictions.

In addition, the invention provides GAT polynucleotides encoding thepreferred GAT polypeptides described above, and complementary nucleotidesequences thereof.

Some aspects of the invention pertain particularly to the subset of anyof the above-described categories of GAT polypeptides having GATactivity, as described herein. These GAT polypeptides are preferred, forexample, for use as agents for conferring glyphosate resistance upon aplant. Examples of desired levels of GAT activity are described herein.

In one aspect, the GAT polypeptides comprise an amino acid sequenceencoded by a recombinant or isolated form of naturally occurring nucleicacids isolated from a natural source, e.g., a bacterial strain.Wild-type polynucleotides encoding such GAT polypeptides may bespecifically screened for by standard techniques known in the art.

The polypeptides defined by SEQ ID NO:6-10, for example, were discoveredby expression cloning of sequences from Bacillus strains exhibiting GATactivity, as described in more detail below.

The invention also includes isolated or recombinant polypeptides whichare encoded by an isolated or recombinant polynucleotide comprising anucleotide sequence which hybridizes under stringent conditions oversubstantially the entire length of a nucleotide sequence selected fromthe group consisting of SEQ ID NO: 516, 517, 518, 519, 520, 521, 522,523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536,537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550,551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564,565, 566, 567, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640,642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 664, 666, 668,670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696,698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724,726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748, 750, 752,754, 756, 758, 760, 762, 764, 768, 770, 772, 774, 776, 778, 780, 782,784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810,812, 814, 816, 818, 820, 822, and 824, their complements, and nucleotidesequences encoding an amino acid sequence selected from the groupconsisting of SEQ ID NO: 568, 569, 570, 571, 572, 573, 574, 575, 576,577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590,591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604,605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618,619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645,647, 649, 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673,675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 695, 697, 699, 701,703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729,731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757,759, 761, 763, 765, 767, 769, 771, 773, 775, 777, 779, 781, 783, 785,787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813,815, 817, 819, 821, 823, and 825, including their complements.

The invention also includes isolated or recombinant polypeptides whichare encoded by an isolated or recombinant polynucleotide comprising anucleotide sequence which hybridizes under stringent conditions oversubstantially the entire length of a nucleotide sequence selected fromthe group consisting of SEQ ID NO: 832, 834, 836, 838, 840, 842, 844,846, 848, 850, 852, 854, 856, 858, 860, 862, 864, 866, 868, 870, 872,874, 876, 878, 880, 882, 884, 886, 888, 890, 892, 894, 896, 898, 900,902, 904, 906, 908, 910, 912, 914, 916, 918, 920, 922, 924, 926, 928,and 930, their complements, and nucleotide sequences encoding an aminoacid sequence selected from the group consisting of SEQ ID NO: 833, 835,837, 839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863,865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891,893, 895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915, 917, 919,921, 923, 925, 927, 929, 931, 953, 954, 955, 956, 957, 958, 959, 960,961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, and 972.

The invention further includes any polypeptide having GAT activity thatis encoded by a fragment of any of the GAT-encoding polynucleotidesdescribed herein.

The invention also provides fragments of GAT polypeptides that can bespliced together to form a functional GAT polypeptide. Splicing can beaccomplished in vitro or in vivo, and can involve cis- or trans-splicing(i.e., intramolecular or intermolecular splicing). The fragmentsthemselves can, but need not, have GAT activity. For example, two ormore segments of a GAT polypeptide can be separated by inteins; removalof the intein sequence by cis-splicing results in a functional GATpolypeptide. In another example, an encrypted GAT polypeptide can beexpressed as two or more separate fragments; trans-splicing of thesesegments results in recovery of a functional GAT polypeptide. Variousaspects of cis- and trans-splicing, gene encryption, and introduction ofintervening sequences are described in more detail in U.S. patentapplication Ser. Nos. 09/517,933 and 09/710,686, both of which areincorporated by reference herein in their entirety.

In general, the invention includes any polypeptide encoded by a modifiedGAT polynucleotide derived by mutation, recursive sequencerecombination, and/or diversification of the polynucleotide sequencesdescribed herein. In some aspects of the invention, a GAT polypeptide ismodified by single or multiple amino acid substitutions, a deletion, aninsertion, or a combination of one or more of these types ofmodifications. Substitutions can be conservative or non-conservative,can alter function or not, and can add new function. Insertions anddeletions can be substantial, such as the case of a truncation of asubstantial fragment of the sequence, or in the fusion of additionalsequence, either internally or at N or C terminal. In some embodimentsof the invention, a GAT polypeptide is part of a fusion proteincomprising a functional addition such as, for example, a secretionsignal, a chloroplast transit peptide, a purification tag, or any of thenumerous other functional groups that will be apparent to the skilledartisan and which are described in more detail elsewhere in thisspecification.

Polypeptides of the invention may contain one or more modified aminoacid. The presence of modified amino acids may be advantageous in, forexample, (a) increasing polypeptide in vivo half-life, (b) reducing orincreasing polypeptide antigenicity, and (c) increasing polypeptidestorage stability. Amino acid(s) are modified, for example,co-translationally or post-translationally during recombinant production(e.g., N-linked glycosylation at N—X—S/T motifs during expression inmammalian cells) or modified by synthetic means.

Non-limiting examples of a modified amino acid include a glycosylatedamino acid, a sulfated amino acid, a prenlyated (e.g., farnesylated,geranylgeranylated) amino acid, an acetylated amino acid, an acylatedamino acid, a PEG-ylated amino acid, a biotinylated amino acid, acarboxylated amino acid, a phosphorylated amino acid, and the like.References adequate to guide one of skill in the modification of aminoacids are replete throughout the literature. Example protocols are foundin Walker (1998) Protein Protocols on CD-ROM (Humana Press, Towata,N.J.).

Recombinant methods for producing and isolating GAT polypeptides of theinvention are described herein. In addition to recombinant production,the polypeptides may be produced by direct peptide synthesis usingsolid-phase techniques (e.g., Stewart et al. (1969) Solid-Phase PeptideSynthesis (WH Freeman Co, San Francisco); and Merrifield (1963) J. Am.Chem. Soc. 85: 2149-2154). Peptide synthesis may be performed usingmanual techniques or by automation. Automated synthesis may be achieved,for example, using Applied Biosystems 431A Peptide Synthesizer (PerkinElmer, Foster City, Calif.) in accordance with the instructions providedby the manufacturer. For example, subsequences may be chemicallysynthesized separately and combined using chemical methods to providefull-length GAT polypeptides. Peptides can also be ordered from avariety of sources.

In another aspect of the invention, a GAT polypeptide of the inventionis used to produce antibodies which have, e.g., diagnostic uses, forexample, related to the activity, distribution, and expression of GATpolypeptides, for example, in various tissues of a transgenic plant.

GAT homologue polypeptides for antibody induction do not requirebiological activity; however, the polypeptide or oligopeptide must beantigenic. Peptides used to induce specific antibodies may have an aminoacid sequence consisting of at least 10 amino acids, preferably at least15 or 20 amino acids. Short stretches of a GAT polypeptide may be fusedwith another protein, such as keyhole limpet hemocyanin, and an antibodyproduced against the chimeric molecule.

Methods of producing polyclonal and monoclonal antibodies are known tothose of skill in the art, and many antibodies are available. See, e.g.,Coligan (1991) Current Protocols in Immunology (Wiley/Greene, NY);Harlow and Lane (1989) Antibodies: A Laboratory Manual (Cold SpringHarbor Press, NY); Stites et al. (eds.) Basic and Clinical Immunology,4th ed. (Lange Medical Publications, Los Altos, Calif.), and referencescited therein; Goding (1986) Monoclonal Antibodies: Principles andPractice, 2d ed. (Academic Press, New York, N.Y.); and Kohler andMilstein (1975) Nature 256: 495-497. Other suitable techniques forantibody preparation include selection of libraries of recombinantantibodies in phage or similar vectors. See, Huse et al. (1989) Science246: 1275-1281; and Ward et al. (1989) Nature 341: 544-546. Specificmonoclonal and polyclonal antibodies and antisera will usually bind witha K_(D) of at least about 0.1 μM, preferably at least about 0.01 μM orbetter, and most typically and preferably, 0.001 μM or better.

Additional details antibody of production and engineering techniques canbe found in Borrebaeck, ed. (1995) Antibody Engineering, 2^(nd) ed.(Freeman and Company, NY); McCafferty et al. (1996) AntibodyEngineering, A Practical Approach (IRL at Oxford Press, Oxford,England); and Paul (1995) Antibody Engineering Protocols (Humana Press,Towata, N.J.).

Sequence Variations

GAT polypeptides of the present invention include conservativelymodified variations of the sequences disclosed herein as SEQ ID NO: 568,569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582,583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596,597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610,611, 612, 613, 614, 615, 616, 617, 618, 619, 621, 623, 625, 627, 629,631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657,659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685,687, 689, 691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713,715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 735, 737, 739, 741,743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769,771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797,799, 801, 803, 805, 807, 809, 811, 813, 815, 817, 819, 821, 823, 825,833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859,861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887,889, 891, 893, 895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915,917, 919, 921, 923, 925, 927, 929, 931, 953, 954, 955, 956, 957, 958,959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, and972. Such conservatively modified variations comprise substitutions,additions or deletions which alter, add or delete a single amino acid ora small percentage of amino acids (typically less than about 5%, moretypically less than about 4%, 2%, or 1%) in any of SEQ ID NO: 568, 569,570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583,584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597,598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611,612, 613, 614, 615, 616, 617, 618, 619, 621, 623, 625, 627, 629, 631,633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657, 659,661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687,689, 691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715,717, 719, 721, 723, 725, 727, 729, 731, 733, 735, 737, 739, 741, 743,745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771,773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799,801, 803, 805, 807, 809, 811, 813, 815, 817, 819, 821, 823, 825, 833,835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859, 861,863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889,891, 893, 895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915, 917,919, 921, 923, 925, 927, 929, 931, 953, 954, 955, 956, 957, 958, 959,960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, and 972.

For example, a conservatively modified variation (e.g., deletion) of the146 amino acid polypeptide identified herein as SEQ ID NO:6 will have alength of at least 140 amino acids, preferably at least 141 amino acids,more preferably at least 144 amino acids, and still more preferably atleast 145 amino acids, corresponding to a deletion of less than about5%, 4%, 2% or about 1%, or less of the polypeptide sequence.

Another example of a conservatively modified variation (e.g., a“conservatively substituted variation”) of the polypeptide identifiedherein as SEQ ID NO:6 will contain “conservative substitutions,”according to the six substitution groups set forth in Table 2, in up toabout 7 residues (i.e., less than about 5%) of the 146 amino acidpolypeptide.

The GAT polypeptide sequence homologues of the invention, includingconservatively substituted sequences, can be present as part of largerpolypeptide sequences such as occur in a GAT polypeptide, in a GATfusion with a signal sequence, e.g., a chloroplast targeting sequence,or upon the addition of one or more domains for purification of theprotein (e.g., poly his segments, FLAG tag segments, etc.). In thelatter case, the additional functional domains have little or no effecton the activity of the GAT portion of the protein, or where theadditional domains can be removed by post synthesis processing stepssuch as by treatment with a protease.

Defining Polypeptides by Immunoreactivity

Because the polypeptides of the invention provide a new class of enzymeswith a defined activity, i.e., the acetylation and acylation ofglyphosate, the polypeptides also provide new structural features whichcan be recognized, e.g., in immunological assays. The generation ofantisera which specifically binds the polypeptides of the invention, aswell as the polypeptides which are bound by such antisera, are a featureof the invention.

The invention includes GAT polypeptides that specifically bind to orthat are specifically immunoreactive with an antibody or antiseragenerated against an immunogen comprising an amino acid sequenceselected from one or more of SEQ ID NO: 568, 569, 570, 571, 572, 573,574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587,588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601,602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615,616, 617, 618, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639,641, 643, 645, 647, 649, 651, 653, 655, 657, 659, 661, 663, 665, 667,669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 695,697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723,725, 727, 729, 731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751,753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 775, 777, 779,781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807,809, 811, 813, 815, 817, 819, 821, 823, 825, 833, 835, 837, 839, 841,843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865, 867, 869,871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893, 895, 897,899, 901, 903, 905, 907, 909, 911, 913, 915, 917, 919, 921, 923, 925,927, 929, 931, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963,964, 965, 966, 967, 968, 969, 970, 971, and 972. To eliminatecross-reactivity with other GAT homologues, the antibody or antisera issubtracted with available related proteins, such as those represented bythe proteins or peptides corresponding to GenBank accession numbersavailable as of the filing date of this application, and exemplified byCAA70664, Z99109 and Y09476. Where the accession number corresponds to anucleic acid, a polypeptide encoded by the nucleic acid is generated andused for antibody/antisera subtraction purposes. FIG. 3 tabulates therelative identity between exemplary GAT sequences and the most closelyrelated sequence available in Genbank, YitI. The function of native YitIhas yet to be elucidated, but the enzyme has been shown to possessdetectable GAT activity.

In one typical format, the immunoassay uses a polyclonal antiserum whichwas raised against one or more polypeptides comprising one or more ofthe sequences corresponding to one or more of SEQ ID NO: 568, 569, 570,571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584,585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598,599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612,613, 614, 615, 616, 617, 618, 619, 621, 623, 625, 627, 629, 631, 633,635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657, 659, 661,663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689,691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717,719, 721, 723, 725, 727, 729, 731, 733, 735, 737, 739, 741, 743, 745,747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773,775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801,803, 805, 807, 809, 811, 813, 815, 817, 819, 821, 823, 825, 833, 835,837, 839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863,865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891,893, 895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915, 917, 919,921, 923, 925, 927, 929, 931, 953, 954, 955, 956, 957, 958, 959, 960,961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, and 972, or asubstantial subsequence thereof (i.e., at least about 30% of the fulllength sequence provided). The full set of potential polypeptideimmunogens derived from SEQ ID NO: 568, 569, 570, 571, 572, 573, 574,575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588,589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602,603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616,617, 618, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641,643, 645, 647, 649, 651, 653, 655, 657, 659, 661, 663, 665, 667, 669,671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 695, 697,699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725,727, 729, 731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753,755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 775, 777, 779, 781,783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809,811, 813, 815, 817, 819, 821, 823, 825, 833, 835, 837, 839, 841, 843,845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871,873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893, 895, 897, 899,901, 903, 905, 907, 909, 911, 913, 915, 917, 919, 921, 923, 925, 927,929, 931, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964,965, 966, 967, 968, 969, 970, 971, and 972 are collectively referred tobelow as “the immunogenic polypeptide(s).” The resulting antisera isoptionally selected to have low cross-reactivity against other relatedsequences and any such cross-reactivity is removed by immunoabsorbtionwith one or more of the related sequences, prior to use of thepolyclonal antiserum in the immunoassay.

In order to produce antisera for use in an immunoassay, one or more ofthe immunogenic polypeptide(s) is produced and purified as describedherein. For example, recombinant protein may be produced in a bacterialcell line. An inbred strain of mice (used in this assay because resultsare more reproducible due to the virtual genetic identity of the mice)is immunized with the immunogenic polypeptide(s) in combination with astandard adjuvant, such as Freund's adjuvant, using a standard mouseimmunization protocol (see, Harlow and Lane (1988) Antibodies, ALaboratory Manual (Cold Spring Harbor Publications, New York), for astandard description of antibody generation, immunoassay formats andconditions that can be used to determine specific immunoreactivity).Alternatively, one or more synthetic or recombinant polypeptides derivedfrom the sequences disclosed herein is conjugated to a carrier proteinand used as an immunogen.

Polyclonal sera are collected and titered against the immunogenicpolypeptide(s) in an immunoassay, for example, a solid phase immunoassaywith one or more of the immunogenic proteins immobilized on a solidsupport. Polyclonal antisera with a titer of 10⁶ or greater areselected, pooled and subtracted with related polypeptides, e.g., thoseidentified from GENBANK as noted, to produce subtracted, pooled, titeredpolyclonal antisera.

The subtracted, pooled, titered polyclonal antisera are tested for crossreactivity against the related polypeptides. Preferably at least two ofthe immunogenic GATs are used in this determination, preferably inconjunction with at least two related polypeptides, to identifyantibodies which are specifically bound by the immunogenicpolypeptide(s).

In this comparative assay, discriminatory binding conditions aredetermined for the subtracted, titered polyclonal antisera which resultin at least about a 5-10 fold higher signal to noise ratio for bindingof the titered polyclonal antisera to the immunogenic GAT polypeptidesas compared to binding to the related polypeptides. That is, thestringency of the binding reaction is adjusted by the addition ofnon-specific competitors such as albumin or non-fat dry milk, or byadjusting salt conditions, temperature, or the like. These bindingconditions are used in subsequent assays for determining whether a testpolypeptide is specifically bound by the pooled, subtracted polyclonalantisera. In particular, a test polypeptide which shows at least a 2-5fold higher signal to noise ratio than the control polypeptide underdiscriminatory binding conditions, and at least about a ½ signal tonoise ratio as compared to the immunogenic polypeptide(s), sharessubstantial structural similarity with the immunogenic polypeptide(s) ascompared to known GAT, and is, therefore a polypeptide of the invention.

In another example, immunoassays in the competitive binding format areused for the detection of a test polypeptide. For example, as noted,cross-reacting antibodies are removed from the pooled antisera mixtureby immunoabsorption with the control GAT polypeptides. The immunogenicpolypeptide(s) are then immobilized to a solid support which is exposedto the subtracted pooled antisera. Test proteins are added to the assayto compete for binding to the pooled, subtracted antisera. The abilityof the test protein(s) to compete for binding to the pooled, subtractedantisera as compared to the immobilized protein(s) is compared to theability of the immunogenic polypeptide(s) added to the assay to competefor binding (the immunogenic polypeptide(s) compete effectively with theimmobilized immunogenic polypeptide(s) for binding to the pooledantisera). The percent cross-reactivity for the test proteins iscalculated, using standard calculations.

In a parallel assay, the ability of the control proteins to compete forbinding to the pooled, subtracted antisera is optionally determined ascompared to the ability of the immunogenic polypeptide(s) to compete forbinding to the antisera. Again, the percent cross-reactivity for thecontrol polypeptides is calculated, using standard calculations. Wherethe percent cross-reactivity is at least 5-10× higher for the testpolypeptides, the test polypeptides are said to specifically bind thepooled, subtracted antisera.

In general, the immunoabsorbed and pooled antisera can be used in acompetitive binding immunoassay as described herein to compare any testpolypeptide to the immunogenic polypeptide(s). In order to make thiscomparison, the two polypeptides are each assayed at a wide range ofconcentrations and the amount of each polypeptide required to inhibit50% of the binding of the subtracted antisera to the immobilized proteinis determined using standard techniques. If the amount of the testpolypeptide required is less than twice the amount of the immunogenicpolypeptide(s) that is required, then the test polypeptide is said tospecifically bind to an antibody generated to the immunogenicpolypeptide(s), provided the amount is at least about 5-10× higher for acontrol polypeptide.

As a final determination of specificity, the pooled antisera isoptionally fully immunosorbed with the immunogenic polypeptide(s)(rather than the control polypeptides) until little or no binding of thesubtracted, pooled antisera to the immunogenic polypeptide(s) isdetectable. This fully immunosorbed antisera is then tested forreactivity with the test polypeptide. If little or no reactivity isobserved (i.e., no more than 2× the signal to noise ratio observed forbinding of the fully immunosorbed antisera to the immunogenicpolypeptide(s)), then the test polypeptide is specifically bound by theantisera elicited by the immunogenic polypeptide(s).

Glyphosate-N-Acetyltransferase Polynucleotides

In one aspect, the invention provides a novel family of isolated orrecombinant polynucleotides referred to herein as“glyphosate-N-acetyltransferase polynucleotides” or “GATpolynucleotides.” GAT polynucleotide sequences are characterized by theability to encode a GAT polypeptide. In general, the invention includesany nucleotide sequence that encodes any of the novel GAT polypeptidesdescribed herein. In some aspects of the invention, a GAT polynucleotidethat encodes a GAT polypeptide with GAT activity is preferred.

In one aspect, the GAT polynucleotides comprise recombinant or isolatedforms of naturally occurring nucleic acids isolated from an organism,e.g., a bacterial strain. Exemplary GAT polynucleotides, e.g., SEQ IDNO: 1-5, were discovered by expression cloning of sequences fromBacillus strains exhibiting GAT activity. Briefly, a collection ofapproximately 500 Bacillus and Pseudomonas strains were screened fornative ability to N-acetylate glyphosate. Strains were grown in LBovernight, harvested by centrifugation, permeabilized in dilute toluene,and then washed and resuspended in a reaction mix containing buffer, 5mM glyphosate, and 200 μM acetyl-CoA. The cells were incubated in thereaction mix for between 1 and 48 hours, at which time an equal volumeof methanol was added to the reaction. The cells were then pelleted bycentrifugation and the supernatant was filtered before analysis byparent ion mode mass spectrometry. The product of the reaction waspositively identified as N-acetylglyphosate by comparing the massspectrometry profile of the reaction mix to an N-acetylglyphosatestandard as shown in FIG. 2. Product detection was dependent oninclusion of both substrates (acetyl CoA and glyphosate) and wasabolished by heat denaturing the bacterial cells.

Individual GAT polynucleotides were then cloned from the identifiedstrains by functional screening. Genomic DNA was prepared and partiallydigested with Sau3A1 enzyme. Fragments of approximately 4 kb were clonedinto an E. coli expression vector and transformed into electrocompetentE. coli. Individual clones exhibiting GAT activity were identified bymass spectrometry following a reaction as described previously exceptthat the toluene wash was replaced by permeabilization with PMBS.Genomic fragments were sequenced and the putative GATpolypeptide-encoding open reading frame was identified. Identity of theGAT gene was confirmed by expression of the open reading frame in E.coli and detection of high levels of N-acetylglyphosate produced fromreaction mixtures.

In another aspect of the invention, GAT polynucleotides are produced bydiversifying, e.g., recombining and/or mutating one or more naturallyoccurring, isolated, or recombinant GAT polynucleotides. As described inmore detail elsewhere herein, it is often possible to generatediversified GAT polynucleotides encoding GAT polypeptides with superiorfunctional attributes, e.g., increased catalytic function, increasedstability, or higher expression level, than a GAT polynucleotide used asa substrate or parent in the diversification process.

The polynucleotides of the invention have a variety of uses in, forexample: recombinant production (i.e., expression) of the GATpolypeptides of the invention; as transgenes (e.g., to confer herbicideresistance in transgenic plants); as selectable markers fortransformation and plasmid maintenance; as immunogens; as diagnosticprobes for the presence of complementary or partially complementarynucleic acids (including for detection of natural GAT coding nucleicacids); as substrates for further diversity generation, e.g.,recombination reactions or mutation reactions to produce new and/orimproved GAT homologues, and the like.

It is important to note that certain specific, substantial and credibleutilities of GAT polynucleotides do not require that the polynucleotideencode a polypeptide with substantial GAT activity. For example, GATpolynucleotides that do not encode active enzymes can be valuablesources of parental polynucleotides for use in diversificationprocedures to arrive at GAT polynucleotide variants, or non-GATpolynucleotides, with desirable functional properties (e.g., highk_(cat) or k_(cat)/K_(m), low K_(m), high stability towards heat orother environmental factors, high transcription or translation rates,resistance to proteolytic cleavage, reducing antigenicity, etc.). Forexample, nucleotide sequences encoding protease variants with little orno detectable activity have been used as parent polynucleotides in DNAshuffling experiments to produce progeny encoding highly activeproteases (Ness et al. (1999) Nature Biotech. 17:893-96).

Polynucleotide sequences produced by diversity generation methods orrecursive sequence recombination (“RSR”) methods (e.g., DNA shuffling)are a feature of the invention. Mutation and recombination methods usingthe nucleic acids described herein are a feature of the invention. Forexample, one method of the invention includes recursively recombiningone or more nucleotide sequences of the invention as described above andbelow with one or more additional nucleotides. The recombining steps areoptionally performed in vivo, ex vivo, in silico or in vitro. Thisdiversity generation or recursive sequence recombination produces atleast one library of recombinant modified GAT polynucleotides.Polypeptides encoded by members of this library are included in theinvention.

Also contemplated are uses of polynucleotides, also referred to hereinas oligonucleotides, typically having at least 12 bases, preferably atleast 15, more preferably at least 20, 30, or 50 or more bases, whichhybridize under stringent or highly stringent conditions to a GATpolynucleotide sequence. The polynucleotides may be used as probes,primers, sense and antisense agents, and the like, according to methodsas noted herein.

In accordance with the present invention, GAT polynucleotides, includingnucleotide sequences that encode GAT polypeptides, fragments of GATpolypeptides, related fusion proteins, or functional equivalentsthereof, are used in recombinant DNA molecules that direct theexpression of the GAT polypeptides in appropriate host cells, such asbacterial or plant cells. Due to the inherent degeneracy of the geneticcode, other nucleic acid sequences which encode substantially the sameor a functionally equivalent amino acid sequence can also be used toclone and express the GAT polynucleotides.

The invention provides GAT polynucleotides that encode transcriptionand/or translation products that are subsequently spliced to ultimatelyproduce functional GAT polypeptides. Splicing can be accomplished invitro or in vivo, and can involve cis- or trans-splicing. The substratefor splicing can be polynucleotides (e.g., RNA transcripts) orpolypeptides. An example of cis-splicing of a polynucleotide is where anintron inserted into a coding sequence is removed and the two flankingexon regions are spliced to generate a GAT polypeptide encodingsequence. An example of trans-splicing would be where a GATpolynucleotide is encrypted by separating the coding sequence into twoor more fragments that can be separately transcribed and then spliced toform the full-length GAT encoding sequence. The use of a splicingenhancer sequence (which can be introduced into a construct of theinvention) can facilitate splicing either in cis or trans. Cis- andtrans-splicing of polypeptides are described in more detail elsewhereherein and in U.S. patent application Ser. Nos. 09/517,933 and09/710,686.

Thus, some GAT polynucleotides do not directly encode a full-length GATpolypeptide, but rather encode a fragment or fragments of a GATpolypeptide. These GAT polynucleotides can be used to express afunctional GAT polypeptide through a mechanism involving splicing, wheresplicing can occur at the level of polynucleotide (e.g., intron/exon)and/or polypeptide (e.g., intein/extein). This can be useful, forexample, in controlling expression of GAT activity, since functional GATpolypeptide will only be expressed if all required fragments areexpressed in an environment that permits splicing processes to generatefunctional product. In another example, introduction of one or moreinsertion sequences into a GAT polynucleotide can facilitaterecombination with a low homology polynucleotide; use of an intron orintein for the insertion sequence facilitates the removal of theintervening sequence, thereby restoring function of the encoded variant.

As will be understood by those of skill in the art, it can beadvantageous to modify a coding sequence to enhance its expression in aparticular host. The genetic code is redundant with 64 possible codons,but most organisms preferentially use a subset of these codons. Thecodons that are utilized most often in a species are called optimalcodons, and those not utilized very often are classified as rare orlow-usage codons (see, e.g., Zhang et al. (1991) Gene 105:61-72). Codonscan be substituted to reflect the preferred codon usage of the host, aprocess sometimes called “codon optimization” or “controlling forspecies codon bias.”

Optimized coding sequences containing codons preferred by a particularprokaryotic or eukaryotic host (see also, Murray et al. (1989) Nucl.Acids Res. 17:477-508) can be prepared, for example, to increase therate of translation or to produce recombinant RNA transcripts havingdesirable properties, such as a longer half-life, as compared withtranscripts produced from a non-optimized sequence. Translation stopcodons can also be modified to reflect host preference. For example,preferred stop codons for S. cerevisiae and mammals are UAA and UGA,respectively. The preferred stop codon for monocotyledonous plants isUGA, whereas insects and E. coli prefer to use UAA as the stop codon(Dalphin et al. (1996) Nucl. Acids Res. 24: 216-218). Methodology foroptimizing a nucleotide sequence for expression in a plant is provided,for example, in U.S. Pat. No. 6,015,891, and the references citedtherein.

One embodiment of the invention includes a GAT polynucleotide havingoptimal codons for expression in a relevant host, e.g., a transgenicplant host. This is particularly desirable when a GAT polynucleotide ofbacterial origin is introduced into a transgenic plant, e.g., to conferglyphosate resistance to the plant.

The polynucleotide sequences of the present invention can be engineeredin order to alter a GAT polynucleotide for a variety of reasons,including but not limited to alterations which modify the cloning,processing and/or expression of the gene product. For example,alterations may be introduced using techniques that are well known inthe art, e.g., site-directed mutagenesis, to insert new restrictionsites, alter glycosylation patterns, change codon preference, introducesplice sites, etc.

As described in more detail herein, the polynucleotides of the inventioninclude sequences which encode novel GAT polypeptides and sequencescomplementary to the coding sequences, and novel fragments of codingsequences and complements thereof. The polynucleotides can be in theform of RNA or in the form of DNA, and include mRNA, cRNA, synthetic RNAand DNA, genomic DNA and cDNA. The polynucleotides can bedouble-stranded or single-stranded, and if single-stranded, can be thecoding strand or the non-coding (anti-sense, complementary) strand. Thepolynucleotides optionally include the coding sequence of a GATpolypeptide (i) in isolation, (ii) in combination with an additionalcoding sequence, so as to encode, e.g., a fusion protein, a pre-protein,a prepro-protein, or the like, (iii) in combination with non-codingsequences, such as introns or inteins, control elements such as apromoter, an enhancer, a terminator element, or 5′ and/or 3′untranslated regions effective for expression of the coding sequence ina suitable host, and/or (iv) in a vector or host environment in whichthe GAT polynucleotide is a heterologous gene. Sequences can also befound in combination with typical compositional formulations of nucleicacids, including in the presence of carriers, buffers, adjuvants,excipients and the like.

Polynucleotides and oligonucleotides of the invention can be prepared bystandard solid-phase methods, according to known synthetic methods.Typically, fragments of up to about 100 bases are individuallysynthesized, then joined (e.g., by enzymatic or chemical ligationmethods, or polymerase mediated methods) to form essentially any desiredcontinuous sequence. For example, polynucleotides and oligonucleotidesof the invention can be prepared by chemical synthesis using, e.g., theclassical phosphoramidite method described by Beaucage et al. (1981)Tetrahedron Letters 22:1859-69, or the method described by Matthes etal. (1984) EMBO J. 3: 801-05, e.g., as is typically practiced inautomated synthetic methods. According to the phosphoramidite method,oligonucleotides are synthesized, e.g., in an automatic DNA synthesizer,purified, annealed, ligated and cloned in appropriate vectors.

In addition, essentially any nucleic acid can be custom ordered from anyof a variety of commercial sources, such as The Midland CertifiedReagent Company (mcrc@oligos.com), The Great American Gene Company(www.genco.com), ExpressGen Inc. (www.expressgen.com), OperonTechnologies Inc. (Alameda, Calif.) and many others. Similarly, peptidesand antibodies can be custom ordered from any of a variety of sources,such as PeptidoGenic (pkim@ccnet.com), HTI Bio-products, Inc.(www.htibio.com), BMA Biomedicals Ltd (U.K.), Bio.Synthesis, Inc., andmany others.

Polynucleotides may also be synthesized by well-known techniques asdescribed in the technical literature. See, e.g., Carruthers et al.,Cold Spring Harbor Symp. Quant. Biol. 47: 411-418 (1982), and Adams etal. (1983) J. Am. Chem. Soc. 105: 661. Double stranded DNA fragments maythen be obtained either by synthesizing the complementary strand andannealing the strands together under appropriate conditions, or byadding the complementary strand using DNA polymerase with an appropriateprimer sequence.

General texts which describe molecular biological techniques usefulherein, including mutagenesis, include Berger and Kimmel, Guide toMolecular Cloning Techniques, Methods in Enzymology, Volume 152(Academic Press, Inc., San Diego, Calif.); Sambrook et al. (1989)Molecular Cloning—A Laboratory Manual, 2nd ed., Volumes 1-3 (Cold SpringHarbor Laboratory, Cold Spring Harbor, N.Y.); and Ausubel et al., eds.(2000) Current Protocols in Molecular Biology (Greene PublishingAssociates, Inc. and John Wiley & Sons, Inc.). Examples of techniquessufficient to direct persons of skill through in vitro amplificationmethods, including the polymerase chain reaction (PCR), the ligase chainreaction (LCR), Qβ-replicase amplification and other RNA polymerasemediated techniques (e.g., NASBA) are found in Berger, Sambrook, andAusubel, as well as in Mullis et al. (1987) U.S. Pat. No. 4,683,202;Innis et al., eds. (1990) PCR Protocols: A Guide to Methods andApplications (Academic Press Inc. San Diego, Calif.); Arnheim & Levinson(Oct. 1, 1990) Chemical and Engineering News 36-47; The Journal Of NIHResearch (1991) 3: 81-94; Kwoh et al. (1989) Proc. Nat'l. Acad. Sci. USA86: 1173; Guatelli et al. (1990) Proc. Nat'l. Acad. Sci. USA 87: 1874;Lomell et al. (1989) J. Clin. Chem. 35: 1826; Landegren et al. (1988)Science 241: 1077-1080; Van Brunt (1990) Biotechnology 8: 291-294; Wuand Wallace (1989) Gene 4: 560; Barringer et al. (1990) Gene 89: 117,and Sooknanan and Malek (1995) Biotechnology 13: 563-564. Improvedmethods of cloning in vitro amplified nucleic acids are described inWallace et al. U.S. Pat. No. 5,426,039. Improved methods of amplifyinglarge nucleic acids by PCR are summarized in Cheng et al. (1994) Nature369: 684-685 and the references cited therein, in which PCR amplicons ofup to 40 kb are generated. One of skill will appreciate that essentiallyany RNA can be converted into a double stranded DNA suitable forrestriction digestion, PCR expansion and sequencing using reversetranscriptase and a polymerase. See, Ausbel, Sambrook and Berger, allsupra.

One aspect of the invention provides an isolated or recombinantpolynucleotide selected from the group consisting of SEQ ID NO: 516,517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530,531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544,545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558,559, 560, 561, 562, 563, 564, 565, 566, 567, 620, 622, 624, 626, 628,630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656,658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684,686, 688, 690, 692, 694, 696, 698, 700, 702, 704, 706, 708, 710, 712,714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740,742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 768, 770,772, 774, 776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798,800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 824, 832,834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860,862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888,890, 892, 894, 896, 898, 900, 902, 904, 906, 908, 910, 912, 914, 916,918, 920, 922, 924, 926, 928, 930, 932, 933, 934, 935, 936, 937, 938,939, 940, 941, 942, 943, 944, 945, 947, 949, 951, and 952.

Preferred polynucleotides of the present invention include an isolatedor recombinant polynucleotide sequence encoding and amino acid sequencethat can be optimally aligned with a reference amino acid sequenceselected from the group consisting of SEQ ID NO: 300, 445, and 457 togenerate a similarity score of at least 460 using the BLOSUM62 matrix, agap existence penalty of 11, and a gap extension penalty of 1, whereinone or more of the following positions conform to the followingrestrictions: (i) at positions 18 and 38, there is a Z5 amino acidresidue; (ii) at position 62, there is a Z1 amino acid residue; (iii) atposition 124, there is a Z6 amino acid residue; and (iv) at position144, there is a Z2 amino acid residue, wherein: Z1 is an amino acidresidue selected from the group consisting of A, I, L, M, and V; Z2 isan amino acid residue selected from the group consisting of F, W, and Y;Z5 is an amino acid residue selected from the group consisting of D andE; and Z6 is an amino acid residue selected from the group consisting ofC, G, and P, and further wherein of the amino acid residues in the aminoacid sequence that correspond to the following positions, at least 90%conform to the following restrictions: (a) at positions 2, 4, 15, 19,26, 28, 31, 45, 51, 54, 86, 90, 91, 97, 103, 105, 106, 114, 123, 129,139, and/or 145 the amino acid residue is B1; and (b) at positions 3, 5,8, 10, 11, 14, 17, 24, 27, 32, 37, 47, 48, 49, 52, 57, 58, 61, 63, 68,69, 79, 80, 82, 83, 89, 92, 100, 101, 104, 119, 120, 125, 126, 128, 131,and/or 143 the amino acid residue is B2; wherein B1 is an amino acidselected from the group consisting of A, I, L, M, F, W, Y, and V; and B2is an amino acid selected from the group consisting of R, N, D, C, Q, E,G, H, K, P, S, and T. When used to specify an amino acid or amino acidresidue, the single letter designations A, C, D, E, F, G, H, I, K, L, M,N, P, Q, R, S, T, V, W, and Y have their standard meaning as used in theart and as provided in Table 1 herein.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with a reference amino acid sequence selected from the groupconsisting of SEQ ID NO: 300, 445, and 457 to generate a similarityscore of at least 460 using the BLOSUM62 matrix, a gap existence penaltyof 11, and a gap extension penalty of 1, one or more of the followingpositions conform to the following restrictions: (i) at positions 18 and38, there is a Z5 amino acid residue; (ii) at position 62, there is a Z1amino acid residue; (iii) at position 124, there is a Z6 amino acidresidue; and (iv) at position 144, there is a Z2 amino acid residue,wherein: Z1 is an amino acid residue selected from the group consistingof A, I, L, M, and V; Z2 is an amino acid residue selected from thegroup consisting of F, W, and Y; Z5 is an amino acid residue selectedfrom the group consisting of D and E; and Z6 is an amino acid residueselected from the group consisting of C, G, and P, and further whereinof the amino acid residues in the amino acid sequence that correspond tothe following positions, at least 80% conform to the followingrestrictions: (a) at positions 2, 4, 15, 19, 26, 28, 51, 54, 86, 90, 91,97, 103, 105, 106, 114, 129, 139, and/or 145 the amino acid residue isZ1; (b) at positions 31 and/or 45 the amino acid residue is Z2; (c) atposition 8 the amino acid residue is Z3; (d) at position 89 the aminoacid residue is Z3 or Z6; (e) at positions 82, 92, 101 and/or 120 theamino acid residue is Z4; (f) at positions 3, 11, 27 and/or 79 the aminoacid residue is Z5; (g) at position 18 the amino acid residue is Z4 orZ5; (h) at position 123 the amino acid residue is Z1 or Z2; (i) atpositions 12, 33, 35, 39, 53, 59, 112, 132, 135, 140, and/or 146 theamino acid residue is Z1 or Z3; 0) at position 30 the amino acid residueis Z1; (k) at position 6 the amino acid residue is Z6; (l) at position81 the amino acid residue is Z2 or Z4; (m) at position 113 the aminoacid residue is Z3; (n) at position 138 the amino acid residue is Z4;(o) at position 142 the amino acid residue is Z2; (p) at positions 57and/or 126 the amino acid residue is Z3 or Z4; (q) at position 5, 17,and 61 the amino acid residue is Z4; (r) at position 24 the amino acidresidue is Z3; (s) at position 104 the amino acid residue is Z5; (t) atpositions 52 and/or 69 the amino acid residue is Z3; (u) at positions 14and/or 119 the amino acid residue is Z5; (v) at positions 10, 32, 63,and/or 83 the amino acid residue is Z5; (w) at positions 48 and/or 80the amino acid residue is Z6; (x) at position 40 the amino acid residueis Z1 or Z2; (y) at position 96 the amino acid residue is Z3 or Z5; (z)at position 65 the amino acid residue is Z3, Z4, or Z6; (aa) atpositions 84 and/or 115 the amino acid residue is Z3; (ab) at position93 the amino acid residue is Z4; (ac) at position 130 the amino acidresidue is Z2; (ad) at position 58 the amino acid residue is Z3, Z4, orZ6; (ae) at position 47 the amino acid residue is Z4 or Z6; (af) atpositions 49 and/or 100 the amino acid residue is Z3 or Z4; (ag) atposition 68 the amino acid residue is Z4 or Z5; (ah) at position 143 theamino acid residue is Z4; (ai) at position 131 the amino acid residue isZ5; (aj) at positions 125 and/or 128 the amino acid residue is Z5; (ak)at position 67 the amino acid residue is Z3 or Z4; (al) at position 60the amino acid residue is Z5; and (am) at position 37 the amino acidresidue is Z4 or Z6; wherein Z1 is an amino acid selected from the groupconsisting of A, I, L, M, and V; Z2 is an amino acid selected from thegroup consisting of F, W, and Y; Z3 is an amino acid selected from thegroup consisting of N, Q, S, and T; Z4 is an amino acid selected fromthe group consisting of R, H, and K; Z5 is an amino acid selected fromthe group consisting of D and E; and Z6 is an amino acid selected fromthe group consisting of C, G, and P.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence further comprising the amino acid residuesin the amino acid sequence that correspond to the positions specified in(a)-(am), at least 90% conform to the amino acid residue restrictionsspecified in(a)-(am).

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 90% of the amino acidresidues in the amino acid sequence conform to the followingrestrictions: (a) at positions 1, 7, 9, 13, 20, 36, 42, 46, 50, 56, 64,70, 72, 75, 76, 78, 94, 98, 107, 110, 117, 118, 121, and/or 141 theamino acid residue is B1; and (b) at positions 16, 21, 22, 23, 25, 29,34, 41, 43, 44, 55, 66, 71, 73, 74, 77, 85, 87, 88, 95, 99, 102, 108,109, 111, 116, 122, 127, 133, 134, 136, and/or 137 the amino acidresidue is B2; wherein B1 is an amino acid selected from the groupconsisting of A, I, L, M, F, W, Y, and V; and B2 is an amino acidselected from the group consisting of R, N, D, C, Q, E, G, H, K, P, S,and T.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 90% of the amino acidresidues in the amino acid sequence conform to the followingrestrictions: (a) at positions 1, 7, 9, 13, 20, 42, 46, 50, 56, 64, 70,72, 75, 76, 78, 94, 98, 107, 110, 117, 118, 121, and/or 141 the aminoacid residue is B1; and (b) at positions 16, 21, 22, 23, 25, 29, 34, 36,41, 43, 44, 55, 66, 71, 73, 74, 77, 85, 87, 88, 95, 99, 102, 108, 109,111, 116, 122, 127, 133, 134, 136, and/or 137 the amino acid residue isB2; wherein B1 is an amino acid selected from the group consisting of A,I, L, M, F, W, Y, and V; and B2 is an amino acid selected from the groupconsisting of R, N, D, C, Q, E, G, H, K, P, S, and T.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 90% of the amino acidresidues in the amino acid sequence conform to the followingrestrictions: (a) at positions 1, 7, 9, 20, 42, 50, 72, 75, 76, 78, 94,98, 110, 121, and/or 141 the amino acid residue is Z1; (b) at positions13, 46, 56, 70, 107, 117, and/or 118 the amino acid residue is Z2; (c)at positions 23, 55, 71, 77, 88, and/or 109 the amino acid residue isZ3; (d) at positions 16, 21, 41, 73, 85, 99, and/or 111 the amino acidresidue is Z4; (e) at positions 34 and/or 95 the amino acid residue isZ5; (f) at position 22, 25, 29, 43, 44, 66, 74, 87, 102, 108, 116, 122,127, 133, 134, 136, and/or 137 the amino acid residue is Z6; wherein Z1is an amino acid selected from the group consisting of A, I, L, M, andV; Z2 is an amino acid selected from the group consisting of F, W, andY; Z3 is an amino acid selected from the group consisting of N, Q, S,and T; Z4 is an amino acid selected from the group consisting of R, H,and K; Z5 is an amino acid selected from the group consisting of D andE; and Z6 is an amino acid selected from the group consisting of C, G,and P.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence further comprising at position 36 an aminoacid residue selected from the group consisting of Z1 and Z3. Somepreferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence further comprising at position 64 an aminoacid residue selected from the group consisting of Z1 and Z2.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 80% of the amino acidresidues in the amino acid sequence conform to the followingrestrictions: (a) at position 2 the amino acid residue is I or L; (b) atposition 3 the amino acid residue is E; (c) at position 4 the amino acidresidue is V or I; (d) at position 5 the amino acid residue is K; (e) atposition 6 the amino acid residue is P; (f) at position 8 the amino acidresidue is N; (g) at position 10 the amino acid residue is E; (h) atposition 11 the amino acid residue is D or E; (i) at position 12 theamino acid residue is T; (j) at position 14 the amino acid residue is Eor D; (k) at position 15 the amino acid residue is L; (l) at position 17the amino acid residue is H; (m) at position 18 the amino acid residueis R, E or K; (n) at position 19 the amino acid residue is I or V; (o)at position 24 the amino acid residue is Q; (p) at position 26 the aminoacid residue is M, L, V or I; (q) at position 27 the amino acid residueis E; (r) at position 28 the amino acid residue is A or V; (s) atposition 30 the amino acid residue is M; (t) at position 31 the aminoacid residue is Y or F; (u) at position 32 the amino acid residue is Eor D; (v) at position 33 the amino acid residue is T or S; (w) atposition 35 the amino acid residue is L; (x) at position 37 the aminoacid residue is R, G, E or Q; (y) at position 39 the amino acid residueis A or S; (z) at position 40 the amino acid residue is F or L; (aa) atposition 45 the amino acid residue is Y or F; (ab) at position 47 theamino acid residue is R or G; (ac) at position 48 the amino acid residueis G; (ad) at position 49 the amino acid residue is K, R, or Q; (ae) atposition 51 the amino acid residue is I or V; (af) at position 52 theamino acid residue is S; (ag) at position 53 the amino acid residue is Ior V; (ah) at position 54 the amino acid residue is A; (ai) at position57 the amino acid residue is H or N; (aj) at position 58 the amino acidresidue is Q, K, R or P; (ak) at position 59 the amino acid residue isA; (al) at position 60 the amino acid residue is E; (am) at position 61the amino acid residue is H or R; (an) at position 63 the amino acidresidue is E or D; (ao) at position 65 the amino acid residue is E, P orQ; (ap) at position 67 the amino acid residue is Q or R; (aq) atposition 68 the amino acid residue is K or E; (ar) at position 69 theamino acid residue is Q; (as) at position 79 the amino acid residue isE; (at) at position 80 the amino acid residue is G; (au) at position 81the amino acid residue is Y, H or F; (av) at position 82 the amino acidresidue is R; (aw) at position 83 the amino acid residue is E or D; (ax)at position 84 the amino acid residue is Q; (ay) at position 86 theamino acid residue is A; (az) at position 89 the amino acid residue isG, T or S; (ba) at position 90 the amino acid residue is L; (bb) atposition 91 the amino acid residue is L, I or V; (bc) at position 92 theamino acid residue is R or K; (bd) at position 93 the amino acid residueis H; (be) at position 96 the amino acid residue is E or Q; (bf) atposition 97 the amino acid residue is I; (bg) at position 100 the aminoacid residue is K or N; (bh) at position 101 the amino acid residue is Kor R; (bi) at position 103 the amino acid residue is A or V; (bj) atposition 104 the amino acid residue is D; (bk) at position 105 the aminoacid residue is M, L or I; (bl) at position 106 the amino acid residueis L; (bm) at position 112 the amino acid residue is T or A; (bn) atposition 113 the amino acid residue is S or T; (bo) at position 114 theamino acid residue is A; (bp) at position 115 the amino acid residue isS; (bq) at position 119 the amino acid residue is K or R; (br) atposition 120 the amino acid residue is K or R; (bs) at position 123 theamino acid residue is F or L; (bt) at position 125 the amino acidresidue is E; (bu) at position 126 the amino acid residue is Q or H;(bv) at position 128 the amino acid residue is E or D; (bw) at position129 the amino acid residue is V or I; (bx) at position 130 the aminoacid residue is F; (by) at position 131 the amino acid residue is D orE; (bx) at position 132 the amino acid residue is T; (ca) at position135 the amino acid residue is V; (cb) at position 138 the amino acidresidue is H; (cc) at position 139 the amino acid residue is I; (cd) atposition 140 the amino acid residue is L or M; (ce) at position 142 theamino acid residue is Y; (cf) at position 143 the amino acid residue isK or R; (cg) at position 145 the amino acid residue is L or I; and (ch)at position 146 the amino acid residue is T.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 90% of the amino acidresidues in the amino acid sequence conform to the amino acid residuerestrictions specified in (a)-(ch) above.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence that when optimally aligned with areference amino acid sequence selected from the group consisting of SEQID NO: 300, 445, and 457 to generate a similarity score of at least 460using the BLOSUM62 matrix, a gap existence penalty of 11, and a gapextension penalty of 1, one or more of the following positions conformto the following restrictions: (i) at positions 18 and 38, there is a Z5amino acid residue; (ii) at position 62, there is a Z1 amino acidresidue; (iii) at position 124, there is a Z6 amino acid residue; and(iv) at position 144, there is a Z2 amino acid residue, wherein: Z1 isan amino acid residue selected from the group consisting of A, I, L, M,and V; Z2 is an amino acid residue selected from the group consisting ofF, W, and Y; Z5 is an amino acid residue selected from the groupconsisting of D and E; and Z6 is an amino acid residue selected from thegroup consisting of C, G, and P, further wherein of the amino acidresidues in the amino acid sequence that correspond to the followingpositions, at least 80% conform to the following restrictions: (a) atpositions 9, 76, 94 and 110 the amino acid residue is A; (b) atpositions 29 and 108 the amino acid residue is C; (c) at position 34 theamino acid residue is D; (d) at position 95 the amino acid residue is E;(e) at position 56 the amino acid residue is F; (f) at positions 43, 44,66, 74, 87, 102, 116, 122, 127 and 136 the amino acid residue is G; (g)at position 41 the amino acid residue is H; (h) at position 7 the aminoacid residue is I; (i) at position 85 the amino acid residue is K; (j)at positions 20, 42, 50, 78 and 121 the amino acid residue is L; (k) atpositions 1 and 141 the amino acid residue is M; (l) at positions 23 and109 the amino acid residue is N; (m) at positions 22, 25, 133, 134 and137 the amino acid residue is P; (n) at position 71 the amino acidresidue is Q; (o) at positions 16, 21, 73, 99 and 111 the amino acidresidue is R; (p) at position 55 the amino acid residue is S; (q) atposition 77 the amino acid residue is T; (r) at position 107 the aminoacid residue is W; and (s) at position 13, 46, 70 and 118 the amino acidresidue is Y.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence which conforms to at least one of thefollowing additional restrictions: (a) at position 36 the amino acidresidue is M, L, or T; (b) at position 72 the amino acid residue is L orI; (c) at position 75 the amino acid residue is M or V; (d) at position64 the amino acid residue is L, I, or F; (e) at position 88 the aminoacid residue is T or S; (f) at position 117 the amino acid residue is Yor F.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence in which at least one of the followingadditional conditions is met: (a) at position 14 the amino acid residueis D; (b) at position 18 the amino acid residue is E; (c) at position 26the amino acid residue is M or V; (e) at position 30 the amino acidresidue is I; (f) at position 32 the amino acid residue is D; (g) atposition 36 the amino acid residue is M or T; (i) at position 37 theamino acid residue is C; (j) at position 38 the amino acid residue is D;(j) at position 53 the amino acid residue is V; (k) at position 58 theamino acid residue is R; (l) at position 61 the amino acid residue is R;(m) at position 62 the amino acid residue is L; (n) at position 64 theamino acid residue is I or F; (o) at position 65 the amino acid residueis P; (p) at position 72 the amino acid residue is I; (q) at position 75the amino acid residue is V; (r) at position 88 the amino acid residueis T; (s) at position 89 the amino acid residue is G; (t) at position 91the amino acid residue is L; (u) at position 98 the amino acid residueis I; (v) at position 105 the amino acid residue I; (w) at position 112the amino acid residue is A; (x) at position 124 the amino acid residueis G or C; (y) at position 128 the amino acid residue is D; (z) atposition 140 the amino acid residue is M; (aa) at position 143 the aminoacid residue is R; and (ab) at position 144 the amino acid residue is W.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence wherein, of the amino acid residues in theamino acid sequence that correspond to the positions specified in (a)through (ab) as described above, at least 80% conform to the amino acidresidue restrictions specified in (a) through (ab).

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence which conforms to at least one of thefollowing additional restrictions: (a) at position 41 the amino acidresidue is H; (b) at position 138 the amino acid residue is H; (c) atposition 34 the amino acid residue is N; and (d) at position 55 theamino acid residue is S.

Some preferred isolated or recombinant polynucleotides of the inventionare selected from the group consisting of: (a) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:577; (b) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:578; (c) a nucleotide sequenceencoding an amino acid sequence that is at least 97% identical to SEQ IDNO:621; (d) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:579; (e) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) a nucleotide sequence encoding an amino acid sequence thatis at least 95% identical to SEQ ID NO:697; (g) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:721; (h) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:613; (i) a nucleotide sequenceencoding an amino acid sequence that is at least 89% identical to SEQ IDNO:677; (j) a nucleotide sequence encoding an amino acid sequence thatis at least 96% identical to SEQ ID NO:584; (k) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:707; (l) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:616; (m) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:612; and (n) a nucleotide sequence encoding an amino acid sequencethat is at least 98% identical to SEQ ID NO:590.

Some preferred isolated or recombinant polynucleotides of the inventionare selected from the group consisting of: (a) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:577; (b) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:578; (c) a nucleotide sequenceencoding an amino acid sequence that is at least 97% identical to SEQ IDNO:621; (d) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:579; (e) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) a nucleotide sequence encoding an amino acid sequence thatis at least 95% identical to SEQ ID NO:697; (g) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:721; (h) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:613; (i) a nucleotide sequenceencoding an amino acid sequence that is at least 89% identical to SEQ IDNO:677; (j) a nucleotide sequence encoding an amino acid sequence thatis at least 96% identical to SEQ ID NO:584; (k) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:707; (l) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:616; (m) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:612; and (n) a nucleotide sequence encoding an amino acid sequencethat is at least 98% identical to SEQ ID NO:590, wherein the followingpositions conform to the following restrictions: (i) at positions 18 and38, there is a Z5 amino acid residue; (ii) at position 62, there is a Z1amino acid residue; (iii) at position 124, there is a Z6 amino acidresidue; and (iv) at position 144, there is a Z2 amino acid residue,wherein: Z1 is an amino acid residue selected from the group consistingof A, I, L, M, and V; Z2 is an amino acid residue selected from thegroup consisting of F, W, and Y; Z5 is an amino acid residue selectedfrom the group consisting of D and E; and Z6 is an amino acid residueselected from the group consisting of C, G, and P.

Some preferred isolated or recombinant polynucleotides of the inventionare selected from the group consisting of: (a) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:577; (b) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:578; (c) a nucleotide sequenceencoding an amino acid sequence that is at least 97% identical to SEQ IDNO:621; (d) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:579; (e) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) a nucleotide sequence encoding an amino acid sequence thatis at least 95% identical to SEQ ID NO:697; (g) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:721; (h) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:613; (i) a nucleotide sequenceencoding an amino acid sequence that is at least 89% identical to SEQ IDNO:677; (j) a nucleotide sequence encoding an amino acid sequence thatis at least 96% identical to SEQ ID NO:584; (k) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:707; (l) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:616; (m) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:612; and (n) a nucleotide sequence encoding an amino acid sequencethat is at least 98% identical to SEQ ID NO:590, further wherein of theamino acid residues in the amino acid sequence that correspond to thefollowing positions, at least 90% conform to the following restrictions:(a) at positions 2, 4, 15, 19, 26, 28, 31, 45, 51, 54, 86, 90, 91, 97,103, 105, 106, 114, 123, 129, 139, and/or 145 the amino acid residue isB1; and (b) at positions 3, 5, 8, 10, 11, 14, 17, 24, 27, 32, 37, 47,48, 49, 52, 57, 58, 61, 63, 68, 69, 79, 80, 82, 83, 89, 92, 100, 101,104, 119, 120, 125, 126, 128, 131, and/or 143 the amino acid residue isB2; wherein B1 is an amino acid selected from the group consisting of A,I, L, M, F, W, Y, and V; and B2 is an amino acid selected from the groupconsisting of R, N, D, C, Q, E, G, H, K, P, S, and T.

Some preferred isolated or recombinant polynucleotides of the inventionare selected from the group consisting of: (a) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:577; (b) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:578; (c) a nucleotide sequenceencoding an amino acid sequence that is at least 97% identical to SEQ IDNO:621; (d) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:579; (e) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) a nucleotide sequence encoding an amino acid sequence thatis at least 95% identical to SEQ ID NO:697; (g) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:721; (h) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:613; (i) a nucleotide sequenceencoding an amino acid sequence that is at least 89% identical to SEQ IDNO:677; (j) a nucleotide sequence encoding an amino acid sequence thatis at least 96% identical to SEQ ID NO:584; (k) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:707; (l) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:616; (m) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:612; and (n) a nucleotide sequence encoding an amino acid sequencethat is at least 98% identical to SEQ ID NO:590, and further wherein ofthe amino acid residues in the amino acid sequence that correspond tothe following positions, at least 80% conform to the followingrestrictions: (a) at positions 2, 4, 15, 19, 26, 28, 51, 54, 86, 90, 91,97, 103, 105, 106, 114, 129, 139, and/or 145 the amino acid residue isZ1; (b) at positions 31 and/or 45 the amino acid residue is Z2; (c) atposition 8 the amino acid residue is Z3; (d) at position 89 the aminoacid residue is Z3 or Z6; (e) at positions 82, 92, 101 and/or 120 theamino acid residue is Z4; (f) at positions 3, 11, 27 and/or 79 the aminoacid residue is Z5; (g) at position 18 the amino acid residue is Z4 orZ5; (h) at position 123 the amino acid residue is Z1 or Z2; (i) atpositions 12, 33, 35, 39, 53, 59, 112, 132, 135, 140, and/or 146 theamino acid residue is Z1 or Z3; (j) at position 30 the amino acidresidue is Z1; (k) at position 6 the amino acid residue is Z6; (l) atposition 81 the amino acid residue is Z2 or Z4; (m) at position 113 theamino acid residue is Z3; (n) at position 138 the amino acid residue isZ4; (o) at position 142 the amino acid residue is Z2; (p) at positions57 and/or 126 the amino acid residue is Z3 or Z4; (q) at position 5, 17,and 61 the amino acid residue is Z4; (r) at position 24 the amino acidresidue is Z3; (s) at position 104 the amino acid residue is Z5; (t) atpositions 52, and/or 69 the amino acid residue is Z3; (u) at positions14 and/or 119 the amino acid residue is Z5; (v) at positions 10, 32, 63,and/or 83 the amino acid residue is Z5; (w) at positions 48 and/or 80the amino acid residue is Z6; (x) at position 40 the amino acid residueis Z1 or Z2; (y) at position 96 the amino acid residue is Z3 or Z5; (z)at position 65 the amino acid residue is Z3, Z4, or Z6; (aa) atpositions 84 and/or 115 the amino acid residue is Z3; (ab) at position93 the amino acid residue is Z4; (ac) at position 130 the amino acidresidue is Z2; (ad) at position 58 the amino acid residue is Z3, Z4 orZ6; (ae) at position 47 the amino acid residue is Z4 or Z6; (af) atpositions 49 and/or 100 the amino acid residue is Z3 or Z4; (ag) atposition 68 the amino acid residue is Z4 or Z5; (ah) at position 143 theamino acid residue is Z4; (ai) at position 131 the amino acid residue isZ5; (aj) at positions 125 and/or 128 the amino acid residue is Z5; (ak)at position 67 the amino acid residue is Z3 or Z4; (al) at position 60the amino acid residue is Z5; and (am) at position 37 the amino acidresidue is Z4 or Z6; wherein Z1 is an amino acid selected from the groupconsisting of A, I, L, M, and V; Z2 is an amino acid selected from thegroup consisting of F, W, and Y; Z3 is an amino acid selected from thegroup consisting of N, Q, S, and T; Z4 is an amino acid selected fromthe group consisting of R, H, and K; Z5 is an amino acid selected fromthe group consisting of D and E; and Z6 is an amino acid selected fromthe group consisting of C, G, and P.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence further wherein of the amino acid residuesin the amino acid sequence that correspond to the positions specified in(a)-(am), at least 90% conform to the amino acid residue restrictionsspecified in(a)-(am).

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence in which of the amino acid residues in theamino acid sequence that correspond to the following positions, at least90% conform to the following additional restrictions: (a) at positions1, 7, 9, 13, 20, 36, 42, 46, 50, 56, 64, 70, 72, 75, 76, 78, 94, 98,107, 110, 117, 118, 121, and/or 141 the amino acid residue is B1; and(b) at positions 16, 21, 22, 23, 25, 29, 34, 41, 43, 44, 55, 66, 71, 73,74, 77, 85, 87, 88, 95, 99, 102, 108, 109, 111, 116, 122, 127, 133, 134,136, and/or 137 the amino acid residue is B2; wherein B1 is an aminoacid selected from the group consisting of A, I, L, M, F, W, Y, and V;and B2 is an amino acid selected from the group consisting of R, N, D,C, Q, E, G, H, K, P, S, and T.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 90% of the amino acidresidues in the amino acid sequence conform to the followingrestrictions: (a) at positions 1, 7, 9, 13, 20, 42, 46, 50, 56, 64, 70,72, 75, 76, 78, 94, 98, 107, 110, 117, 118, 121, and/or 141 the aminoacid residue is B1; and (b) at positions 16, 21, 22, 23, 25, 29, 34, 36,41, 43, 44, 55, 66, 71, 73, 74, 77, 85, 87, 88, 95, 99, 102, 108, 109,111, 116, 122, 127, 133, 134, 136, and/or 137 the amino acid residue isB2; wherein B1 is an amino acid selected from the group consisting of A,I, L, M, F, W, Y, and V; and B2 is an amino acid selected from the groupconsisting of R, N, D, C, Q, E, G, H, K, P, S, and T.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 90% of the amino acidresidues in the amino acid sequence conform to the followingrestrictions: (a) at positions 1, 7, 9, 20, 42, 50, 72, 75, 76, 78, 94,98, 110, 121, and/or 141 the amino acid residue is Z1; (b) at positions13, 46, 56, 70, 107, 117, and/or 118 the amino acid residue is Z2; (c)at positions 23, 55, 71, 77, 88, and/or 109 the amino acid residue isZ3; (d) at positions 16, 21, 41, 73, 85, 99, and/or 111 the amino acidresidue is Z4; (e) at positions 34 and/or 95 the amino acid residue isZ5; (f) at position 22, 25, 29, 43, 44, 66, 74, 87, 102, 108, 116, 122,127, 133, 134, 136, and/or 137 the amino acid residue is Z6; wherein Z1is an amino acid selected from the group consisting of A, I, L, M, andV; Z2 is an amino acid selected from the group consisting of F, W, andY; Z3 is an amino acid selected from the group consisting of N, Q, S,and T; Z4 is an amino acid selected from the group consisting of R, H,and K; Z5 is an amino acid selected from the group consisting of D andE; and Z6 is an amino acid selected from the group consisting of C, G,and P.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence further comprising at position 36 an aminoacid residue selected from the group consisting of Z1 and Z3. Somepreferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence further comprising at position 64 an aminoacid residue selected from the group consisting of Z1 and Z2.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 80% of the amino acidresidues in the amino acid sequence conform to the followingrestrictions: (a) at position 2 the amino acid residue is I or L; (b) atposition 3 the amino acid residue is E; (c) at position 4 the amino acidresidue is V or I; (d) at position 5 the amino acid residue is K; (e) atposition 6 the amino acid residue is P; (f) at position 8 the amino acidresidue is N; (g) at position 10 the amino acid residue is E; (h) atposition 11 the amino acid residue is D or E; (i) at position 12 theamino acid residue is T; (j) at position 14 the amino acid residue is Eor D; (k) at position 15 the amino acid residue is L; (l) at position 17the amino acid residue is H; (m) at position 18 the amino acid residueis R, E or K; (n) at position 19 the amino acid residue is I or V; (o)at position 24 the amino acid residue is Q; (p) at position 26 the aminoacid residue is M, L, V or I; (q) at position 27 the amino acid residueis E; (r) at position 28 the amino acid residue is A or V; (s) atposition 30 the amino acid residue is M; (t) at position 31 the aminoacid residue is Y or F; (u) at position 32 the amino acid residue is Eor D; (v) at position 33 the amino acid residue is T or S; (w) atposition 35 the amino acid residue is L; (x) at position 37 the aminoacid residue is R, G, E or Q; (y) at position 39 the amino acid residueis A or S; (z) at position 40 the amino acid residue is F or L; (aa) atposition 45 the amino acid residue is Y or F; (ab) at position 47 theamino acid residue is R or G; (ac) at position 48 the amino acid residueis G; (ad) at position 49 the amino acid residue is K, R, or Q; (ae) atposition 51 the amino acid residue is I or V; (af) at position 52 theamino acid residue is S; (ag) at position 53 the amino acid residue is Ior V; (ah) at position 54 the amino acid residue is A; (ai) at position57 the amino acid residue is H or N; (aj) at position 58 the amino acidresidue is Q, K, R or P; (ak) at position 59 the amino acid residue isA; (al) at position 60 the amino acid residue is E; (am) at position 61the amino acid residue is H or R; (an) at position 63 the amino acidresidue is E or D; (ao) at position 65 the amino acid residue is E, P orQ; (ap) at position 67 the amino acid residue is Q or R; (aq) atposition 68 the amino acid residue is K or E; (ar) at position 69 theamino acid residue is Q; (as) at position 79 the amino acid residue isE; (at) at position 80 the amino acid residue is G; (au) at position 81the amino acid residue is Y, H or F; (av) at position 82 the amino acidresidue is R; (aw) at position 83 the amino acid residue is E or D; (ax)at position 84 the amino acid residue is Q; (ay) at position 86 theamino acid residue is A; (az) at position 89 the amino acid residue isG, T or S; (ba) at position 90 the amino acid residue is L; (bb) atposition 91 the amino acid residue is L, I or V; (bc) at position 92 theamino acid residue is R or K; (bd) at position 93 the amino acid residueis H; (be) at position 96 the amino acid residue is E or Q; (bf) atposition 97 the amino acid residue is I; (bg) at position 100 the aminoacid residue is K or N; (bh) at position 101 the amino acid residue is Kor R; (bi) at position 103 the amino acid residue is A or V; (bj) atposition 104 the amino acid residue is D; (bk) at position 105 the aminoacid residue is M, L or I; (bl) at position 106 the amino acid residueis L; (bm) at position 112 the amino acid residue is T or A; (bn) atposition 113 the amino acid residue is S or T; (bo) at position 114 theamino acid residue is A; (bp) at position 115 the amino acid residue isS; (bq) at position 119 the amino acid residue is K or R; (br) atposition 120 the amino acid residue is K or R; (bs) at position 123 theamino acid residue is F or L; (bt) at position 125 the amino acidresidue is E; (bu) at position 126 the amino acid residue is Q or H;(by) at position 128 the amino acid residue is E or D; (bw) at position129 the amino acid residue is V or I; (bx) at position 130 the aminoacid residue is F; (by) at position 131 the amino acid residue is D orE; (bx) at position 132 the amino acid residue is T; (ca) at position135 the amino acid residue is V; (cb) at position 138 the amino acidresidue is H; (cc) at position 139 the amino acid residue is I; (cd) atposition 140 the amino acid residue is L or M; (ce) at position 142 theamino acid residue is Y; (cf) at position 143 the amino acid residue isK or R; (cg) at position 145 the amino acid residue is L or I; and (ch)at position 146 the amino acid residue is T.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 90% of the amino acidresidues in the amino acid sequence conform to the amino acid residuerestrictions specified in (a)-(ch) above.

Some preferred isolated or recombinant polynucleotides of the inventionare selected from the group consisting of: (a) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:577; (b) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:578; (c) a nucleotide sequenceencoding an amino acid sequence that is at least 97% identical to SEQ IDNO:621; (d) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:579; (e) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:602; (f) a nucleotide sequence encoding an amino acid sequence thatis at least 95% identical to SEQ ID NO:697; (g) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:721; (h) a nucleotide sequence encoding an amino acid sequence thatis at least 97% identical to SEQ ID NO:613; (i) a nucleotide sequenceencoding an amino acid sequence that is at least 89% identical to SEQ IDNO:677; (j) a nucleotide sequence encoding an amino acid sequence thatis at least 96% identical to SEQ ID NO:584; (k) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:707; (l) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:616; (m) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:612; and (n) a nucleotide sequence encoding an amino acid sequencethat is at least 98% identical to SEQ ID NO:590, and further wherein ofthe amino acid residues in the amino acid sequence that correspond tothe following positions, at least 80% conform to the followingrestrictions: (a) at positions 9, 76, 94 and 110 the amino acid residueis A; (b) at positions 29 and 108 the amino acid residue is C; (c) atposition 34 the amino acid residue is D; (d) at position 95 the aminoacid residue is E; (e) at position 56 the amino acid residue is F; (f)at positions 43, 44, 66, 74, 87, 102, 116, 122, 127 and 136 the aminoacid residue is G; (g) at position 41 the amino acid residue is H; (h)at position 7 the amino acid residue is I; (i) at position 85 the aminoacid residue is K; (j) at positions 20, 42, 50, 78 and 121 the aminoacid residue is L; (k) at positions 1 and 141 the amino acid residue isM; (l) at positions 23 and 109 the amino acid residue is N; (m) atpositions 22, 25, 133, 134 and 137 the amino acid residue is P; (n) atposition 71 the amino acid residue is Q; (o) at positions 16, 21, 73, 99and 111 the amino acid residue is R; (p) at position 55 the amino acidresidue is S; (q) at position 77 the amino acid residue is T; (r) atposition 107 the amino acid residue is W; and (s) at position 13, 46, 70and 118 the amino acid residue is Y.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence further comprising at least one amino acidresidue that meets the following criteria: (a) at position 14 the aminoacid residue is D; (b) at position 18 the amino acid residue is E; (c)at position 26 the amino acid residue is M or V; (e) at position 30 theamino acid residue is I; (f) at position 32 the amino acid residue is D;(g) at position 36 the amino acid residue is M or T; (i) at position 37the amino acid residue is C; (j) at position 38 the amino acid residueis D; (j) at position 53 the amino acid residue is V; (k) at position 58the amino acid residue is R; (l) at position 61 the amino acid residueis R; (m) at position 62 the amino acid residue is L; (n) at position 64the amino acid residue is I or F; (o) at position 65 the amino acidresidue is P; (p) at position 72 the amino acid residue is I; (q) atposition 75 the amino acid residue is V; (r) at position 88 the aminoacid residue is T; (s) at position 89 the amino acid residue is G; (t)at position 91 the amino acid residue is L; (u) at position 98 the aminoacid residue is I; (v) at position 105 the amino acid residue I; (w) atposition 112 the amino acid residue is A; (x) at position 124 the aminoacid residue is G or C; (y) at position 128 the amino acid residue is D;(z) at position 140 the amino acid residue is M; (aa) at position 143the amino acid residue is R; and (ab) at position 144 the amino acidresidue is W.

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence such that when the sequence is optimallyaligned with SEQ ID NO: 300, 445, or 457, at least 80% of the amino acidresidues in the amino acid sequence conform to the amino acid residuerestrictions specified in (a) through (ab) above.

Some preferred isolated or recombinant polynucleotides of the inventioncomprise a nucleotide sequence which encodes an amino acid sequenceselected from the group consisting of: (a) an amino acid sequence thatis at least 96% identical to SEQ ID NO:919 (such as, for example, anucleotide sequence which encodes SEQ ID NO:917, 919, 921, 923, 925,927, 833, 835, 839, 843, 845, 859, 863, 873, 877, 891, 895, 901, 905,907, 913, 915, or 950); (b) an amino acid sequence that is at least 97%identical to SEQ ID NO:929 (such as, for example, a nucleotide sequencewhich encodes SEQ ID NO:929, 931, 835, 843, 849, or 867); (c) an aminoacid sequence that is at least 98% identical to SEQ ID NO:847 (such as,for example, a nucleotide sequence which encodes SEQ ID NO:845 or 847);(d) an amino acid sequence that is at least 98% identical to SEQ IDNO:851; (e) an amino acid sequence that is at least 98% identical to SEQID NO:853; (f) an amino acid sequence that is at least 98% identical toSEQ ID NO:855 (such as, for example, a nucleotide sequence which encodesSEQ ID NO:835 or 855); (g) an amino acid sequence that is at least 98%identical to SEQ ID NO:857; (h) an amino acid sequence that is at least98% identical to SEQ ID NO:861 (such as, for example, a nucleotidesequence which encodes SEQ ID NO:839, 861, or 883); (i) an amino acidsequence that is at least 98% identical to SEQ ID NO:871; 0) an aminoacid sequence that is at least 98% identical to SEQ ID NO:875; (k) anamino acid sequence that is at least 98% identical to SEQ ID NO:881; (l)an amino acid sequence that is at least 98% identical to SEQ ID NO:885(such as, for example, a nucleotide sequence which encodes SEQ ID NO:845or 885); (m) an amino acid sequence that is at least 98% identical toSEQ ID NO:887; (n) an amino acid sequence that is at least 98% identicalto SEQ ID NO:889 (such as, for example, a nucleotide sequence whichencodes SEQ ID NO: 863, 889, 891, or 903); (o) an amino acid sequencethat is at least 98% identical to SEQ ID NO:893; (p) an amino acidsequence that is at least 98% identical to SEQ ID NO:897; (q) an aminoacid sequence that is at least 98% identical to SEQ ID NO:899; (r) anamino acid sequence that is at least 98% identical to SEQ ID NO:909(such as, for example, a nucleotide sequence which encodes SEQ ID NO:883or 909); (s) an amino acid sequence that is at least 98% identical toSEQ ID NO:911; (t) an amino acid sequence that is at least 99% identicalto SEQ ID NO:837; (u) an amino acid sequence that is at least 99%identical to SEQ ID NO:841; (v) an amino acid sequence that is at least99% identical to SEQ ID NO:865; (w) an amino acid sequence that is atleast 99% identical to SEQ ID NO:869; and (x) an amino acid sequencethat is at least 99% identical to SEQ ID NO:879.

Some preferred isolated or recombinant polynucleotides of the inventionare selected from the group consisting of: (a) a nucleotide sequenceencoding an amino acid sequence that is at least 96% identical to SEQ IDNO:919 (for example, a nucleotide sequence such as SEQ ID NO:916, 918,920, 922, 924, 926, 832, 834, 838, 842, 844, 858, 862, 872, 876, 890,894, 900, 904, 906, 912, 914, 939, 940, 941, 942, 943, 944, 949, 951 or952); (b) a nucleotide sequence encoding an amino acid sequence that isat least 97% identical to SEQ ID NO:929 (for example, a nucleotidesequence such as SEQ ID NO:928, 930, 834, 842, 848, 866, 936 or 937);(c) a nucleotide sequence encoding an amino acid sequence that is atleast 98% identical to SEQ ID NO:847 (for example, a nucleotide sequencesuch as SEQ ID NO:844 or 846); (d) a nucleotide sequence encoding anamino acid sequence that is at least 98% identical to SEQ ID NO:851 (forexample, a nucleotide sequence such as SEQ ID NO:850); (e) a nucleotidesequence encoding an amino acid sequence that is at least 98% identicalto SEQ ID NO:853 (for example, a nucleotide sequence such as SEQ IDNO:852); (f) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:855 (for example, a nucleotidesequence such as SEQ ID NO:834 or 854); (g) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:857 (for example, a nucleotide sequence such as SEQ ID NO:856); (h) anucleotide sequence encoding an amino acid sequence that is at least 98%identical to SEQ ID NO:861 (for example, a nucleotide sequence such asSEQ ID NO:838, 860, or 882); (i) a nucleotide sequence encoding an aminoacid sequence that is at least 98% identical to SEQ ID NO:871 (forexample, a nucleotide sequence such as SEQ ID NO:870); (j) a nucleotidesequence encoding an amino acid sequence that is at least 98% identicalto SEQ ID NO:875 (for example, a nucleotide sequence such as SEQ IDNO:874); (k) a nucleotide sequence encoding an amino acid sequence thatis at least 98% identical to SEQ ID NO:881 (for example, a nucleotidesequence such as SEQ ID NO:880); (l) a nucleotide sequence encoding anamino acid sequence that is at least 98% identical to SEQ ID NO:885 (forexample, a nucleotide sequence such as SEQ ID NO:844 or 884); (m) anucleotide sequence encoding an amino acid sequence that is at least 98%identical to SEQ ID NO:887 (for example, a nucleotide sequence such asSEQ ID NO:886); (n) a nucleotide sequence encoding an amino acidsequence that is at least 98% identical to SEQ ID NO:889 (for example, anucleotide sequence such as SEQ ID NO: 862, 888, 890, or 902); (o) anucleotide sequence encoding an amino acid sequence that is at least 98%identical to SEQ ID NO:893 (for example, a nucleotide sequence such asSEQ ID NO:892); (p) a nucleotide sequence encoding an amino acidsequence that is at least 98% identical to SEQ ID NO:897 (for example, anucleotide sequence such as SEQ ID NO:896); (q) a nucleotide sequenceencoding an amino acid sequence that is at least 98% identical to SEQ IDNO:899 (for example, a nucleotide sequence such as SEQ ID NO:898); (r) anucleotide sequence encoding an amino acid sequence that is at least 98%identical to SEQ ID NO:909 (for example, a nucleotide sequence such asSEQ ID NO:882 or 908); (s) a nucleotide sequence encoding an amino acidsequence that is at least 98% identical to SEQ ID NO:911 (for example, anucleotide sequence such as SEQ ID NO:910); (t) a nucleotide sequenceencoding an amino acid sequence that is at least 99% identical to SEQ IDNO:837 (for example, a nucleotide sequence such as SEQ ID NO:836); (u) anucleotide sequence encoding an amino acid sequence that is at least 99%identical to SEQ ID NO:841 (for example, a nucleotide sequence such asSEQ ID NO:840); (v) a nucleotide sequence encoding an amino acidsequence that is at least 99% identical to SEQ ID NO:865 (for example, anucleotide sequence such as SEQ ID NO:864); (w) a nucleotide sequenceencoding an amino acid sequence that is at least 99% identical to SEQ IDNO:869 (for example, a nucleotide sequence such as SEQ ID NO:868); and(x) a nucleotide sequence encoding an amino acid sequence that is atleast 99% identical to SEQ ID NO:879 (for example, a nucleotide sequencesuch as SEQ ID NO:878).

Some preferred isolated or recombinant polynucleotides of the inventioncomprise a nucleotide sequence encoding an amino acid sequence that isat least 95% identical to SEQ ID NO:929 and which comprises a Gly or anAsn residue at the amino acid position corresponding to position 33 ofSEQ ID NO:929 (such as, for example, a nucleotide sequence which encodesSEQ ID NO:837, 849, 893, 897, 905, 921, 927, 929 or 931). Some preferredisolated or recombinant polynucleotides of the invention comprise anucleotide sequence encoding an amino acid sequence that is at least 95%identical to SEQ ID NO:929 and which comprises a Gly or an Asn residueat the amino acid position corresponding to position 33 of SEQ ID NO:929(for example, a nucleotide sequence such as SEQ ID NO:836, 848, 892,896, 904, 920, 926, 928, 930, 938).

Some preferred isolated or recombinant polynucleotides of the inventionencode an amino acid sequence which further comprises one or more aminoacid residues meeting the following criteria: (a) at position 41 theamino acid residue is H; (b) at position 138 the amino acid residue isH; (c) at position 34 the amino acid residue is N; and (d) at position55 the amino acid residue is S.

While description of the polypeptides of the invention is sometimesexpressed herein as a list of possible restrictions on what amino acidresidues are found at particular positions, in some embodiments, apolypeptide of the invention meets all of a particular set of possiblerestrictions. That is, in some instances herein, a list of possiblerestrictions is expressed as a list of options joined by the conjunction“and/or,” and in some embodiments, each such conjunction operates as an“and” rather than an “or.” In some embodiments, possible restrictionswhich are expressed as alternate possibilities are all found in thepolypeptide of the invention; this is only true where the alternatepossibilities are not mutually exclusive.

Sequence Variations

It will be appreciated by those skilled in the art that due to thedegeneracy of the genetic code, a multitude of nucleotide sequencesencoding GAT polypeptides of the invention may be produced, some ofwhich bear substantial identity to the nucleic acid sequences explicitlydisclosed herein.

TABLE 1 Codon Table Amino acids Codon Alanine Ala A GCA GCC GCG GCUCysteine Cys C UGC UGU Aspartic acid Asp D GAC GAU Glutamic acid Glu EGAA GAG Phenylalanine Phe F UUC UUU Glycine Gly G GGA GGC GGG GGUHistidine His H CAC CAU Isoleucine Ile I AUA AUC AUU Lysine Lys K AAAAAG Leucine Leu L UUA UUG CUA CUC CUG CUU Methionine Met M AUGAsparagine Asn N AAC AAU Proline Pro P CCA CCC CCG CCU Glutamine Gln QCAA CAG Arginine Arg R AGA AGG CGA CGC CGG CGU Serine Ser S AGC AGU UCAUCC UCG UCU Threonine Thr T ACA ACC ACG ACU Valine Val V GUA GUC GUG GUUTryptophan Trp W UGG Tyrosine Tyr Y UAC UAU

For instance, inspection of the codon table (Table 1) shows that codonsAGA, AGG, CGA, CGC, CGG, and CGU all encode the amino acid arginine.Thus, at every position in the nucleic acids of the invention where anarginine is specified by a codon, the codon can be altered to any of thecorresponding codons described above without altering the encodedpolypeptide. It is understood that U in an RNA sequence corresponds to Tin a DNA sequence.

Using as an example the nucleic acid sequence corresponding tonucleotides 1-15 of SEQ ID NO: 1 (ATG ATT GAA GTC AAA (SEQ ID NO:826)),a silent variation of this sequence includes AGT ATC GAG GTG AAG (SEQ IDNO:827); both sequences encode the amino acid sequence MIEVK (SEQ IDNO:828), which corresponds to amino acids 1-5 of SEQ ID NO:6.

Such “silent variations” are one species of “conservatively modifiedvariations,” as discussed below. One of skill will recognize that eachcodon in a nucleic acid (except AUG, which is ordinarily the only codonfor methionine) can be modified by standard techniques to encode afunctionally identical polypeptide. Accordingly, each silent variationof a nucleic acid which encodes a polypeptide is implicit in anydescribed sequence. The invention provides each and every possiblevariation of nucleic acid sequence encoding a polypeptide of theinvention that could be made by selecting combinations based on possiblecodon choices. These combinations are made in accordance with thestandard triplet genetic code (e.g., as set forth in Table 1) as appliedto the nucleic acid sequence encoding a GAT homologue polypeptide of theinvention. All such variations of every nucleic acid herein arespecifically provided and described by consideration of the sequence incombination with the genetic code. Any variant can be produced as notedherein.

A group of two or more different codons that, when translated in thesame context, all encode the same amino acid, are referred to herein as“synonymous codons.” As described herein, in some aspects of theinvention a GAT polynucleotide is engineered for optimized codon usagein a desired host organism, for example a plant host. The term“optimized” or “optimal” are not meant to be restricted to the very bestpossible combination of codons, but simply indicates that the codingsequence as a whole possesses an improved usage of codons relative to aprecursor polynucleotide from which it was derived. Thus, in one aspectthe invention provides a method for producing a GAT polynucleotidevariant by replacing at least one parental codon in a nucleotidesequence with a synonymous codon that is preferentially used in adesired host organism, e.g., a plant, relative to the parental codon.

“Conservatively modified variations” or, simply, “conservativevariations” of a particular nucleic acid sequence refers to thosenucleic acids which encode identical or essentially identical amino acidsequences, or, where the nucleic acid does not encode an amino acidsequence, to essentially identical sequences. One of skill willrecognize that individual substitutions, deletions or additions whichalter, add or delete a single amino acid or a small percentage of aminoacids (typically less than 5%, more typically less than 4%, 2% or 1%, orless) in an encoded sequence are “conservatively modified variations”where the alterations result in the deletion of an amino acid, additionof an amino acid, or substitution of an amino acid with a chemicallysimilar amino acid.

Conservative substitution tables providing functionally similar aminoacids are well known in the art. Table 2 sets forth six groups whichcontain amino acids that are “conservative substitutions” for oneanother.

TABLE 2 Conservative Substitution Groups 1 Alanine (A) Serine (S)Threonine (T) 2 Aspartic acid (D) Glutamic acid (E) 3 Asparagine (N)Glutamine (Q) 4 Arginine (R) Lysine (K) 5 Isoleucine (I) Leucine (L)Methionine (M) Valine (V) 6 Phenylalanine (F) Tyrosine (Y) Tryptophan(W)

Thus, “conservatively substituted variations” of a listed polypeptidesequence of the present invention include substitutions of a smallpercentage, typically less than 5%, more typically less than 2% andoften less than 1%, of the amino acids of the polypeptide sequence, witha conservatively selected amino acid of the same conservativesubstitution group. Thus, a conservatively substituted variation of apolypeptide of the invention can contain 1, 2, 3, 4, 5, 6, 7, 8, 9, or10 substitutions with a conservatively substituted variation of the sameconservative substitution group.

For example, a conservatively substituted variation of the polypeptideidentified herein as SEQ ID NO:6 will contain “conservativesubstitutions” according to the six groups defined above, in up to 7residues (i.e., 5% of the amino acids) in the 146 amino acidpolypeptide.

In a further example, if four conservative substitutions were localizedin the region corresponding to amino acids 21 to 30 of SEQ ID NO:6,examples of conservatively substituted variations of this region,

RPN QPL EAC M (SEQ ID NO:829), include:

KPQ QPV ESC M (SEQ ID NO:830) and

KPN NPL DAC V (SEQ ID NO:831) and the like, in accordance with theconservative substitutions listed in Table 2 (in the above example,conservative substitutions are underlined). The listing of a proteinsequence herein, in conjunction with the above substitution table,provides an express listing of all conservatively substituted proteins.

Finally, the addition of sequences which do not alter the encodedactivity of a nucleic acid molecule, such as the addition of anon-functional or non-coding sequence, is a conservative variation ofthe basic nucleic acid.

One of skill will appreciate that many conservative variations of thenucleic acid constructs which are disclosed yield a functionallyidentical construct. For example, as discussed above, owing to thedegeneracy of the genetic code, “silent substitutions” (i.e.,substitutions in a nucleic acid sequence which do not result in analteration in an encoded polypeptide) are an implied feature of everynucleic acid sequence which encodes an amino acid. Similarly,“conservative amino acid substitutions,” in one or a few amino acids inan amino acid sequence are substituted with different amino acids withhighly similar properties, are also readily identified as being highlysimilar to a disclosed construct. Such conservative variations of eachdisclosed sequence are a feature of the present invention.

Non-conservative modifications of a particular nucleic acid are thosewhich substitute any amino acid not characterized as a conservativesubstitution. For example, any substitution which crosses the bounds ofthe six groups set forth in Table 2. These include substitutions ofbasic or acidic amino acids for neutral amino acids, (e.g., Asp, Glu,Asn, or Gln for Val, Ile, Leu or Met), aromatic amino acid for basic oracidic amino acids (e.g., Phe, Tyr or Trp for Asp, Asn, Glu or Gln) orany other substitution not replacing an amino acid with a like aminoacid.

Nucleic Acid Hybridization

Nucleic acids “hybridize” when they associate, typically in solution.Nucleic acids hybridize due to a variety of well-characterizedphysico-chemical forces, such as hydrogen bonding, solvent exclusion,base stacking and the like. An extensive guide to the hybridization ofnucleic acids is found in Tijssen (1993) Laboratory Techniques inBiochemistry and Molecular Biology—Hybridization with Nucleic AcidProbes, Part I, Chapter 2, “Overview of principles of hybridization andthe strategy of nucleic acid probe assays,” (Elsevier, New York(“Tijssen”)), as well as in Ausubel, supra, Hames and Higgins (1995)Gene Probes 1, IRL Press at Oxford University Press, Oxford, England(“Hames and Higgins 1”) and Hames and Higgins (1995) Gene Probes 2, IRLPress at Oxford University Press, Oxford, England (“Hames and Higgins2”) and provide details on the synthesis, labeling, detection andquantification of DNA and RNA, including oligonucleotides.

“Stringent hybridization wash conditions” in the context of nucleic acidhybridization experiments, such as Southern and northern hybridizations,are sequence dependent, and are different under different environmentalparameters. An extensive guide to the hybridization of nucleic acids isfound in Tijssen (1993), supra, and in Hames and Higgins 1 and Hames andHiggins 2, supra.

For purposes of the present invention, generally, “highly stringent”hybridization and wash conditions are selected to be about 5° C. or lesslower than the thermal melting point (T_(m)) for the specific sequenceat a defined ionic strength and pH (as noted below, highly stringentconditions can also be referred to in comparative terms). The T_(m) isthe temperature (under defined ionic strength and pH) at which 50% ofthe test sequence hybridizes to a perfectly matched probe. Verystringent conditions are selected to be equal to the T_(m) for aparticular probe.

The T_(m) of a nucleic acid duplex indicates the temperature at whichthe duplex is 50% denatured under the given conditions and itsrepresents a direct measure of the stability of the nucleic acid hybrid.Thus, the T_(m) corresponds to the temperature corresponding to themidpoint in transition from helix to random coil and it depends onlength, nucleotide composition, and ionic strength for long stretches ofnucleotides.

After hybridization, unhybridized nucleic acid material can be removedby a series of washes, the stringency of which can be adjusted dependingupon the desired results. Low stringency washing conditions (e.g., usinghigher salt and lower temperature) increase sensitivity, but can producenonspecific hybridization signals and high background signals. Higherstringency conditions (e.g., using lower salt and higher temperaturethat is closer to the hybridization temperature) lowers the backgroundsignal, typically with only the specific signal remaining. See Rapley,R. and Walker, J. M. eds., Molecular Biomethods Handbook (Humana Press,Inc. 1998) (hereinafter “Rapley and Walker”), which is incorporatedherein by reference in its entirety for all purposes.

The T_(m) of a DNA-DNA duplex can be estimated using Equation 1 asfollows:

T _(m)(° C.)=81.5° C.+16.6(log₁₀ M)+0.41(%G+C)−0.72(%f)−500/n,

where M is the molarity of the monovalent cations (usually Na+), (%G+C)is the percentage of guanosine (G) and cytosine (C) nucleotides, (% f)is the percentage of formalize and n is the number of nucleotide bases(i.e., length) of the hybrid. See Rapley and Walker, supra.

The T_(m) of an RNA-DNA duplex can be estimated by using Equation 2 asfollows:

T _(m)(° C.)=79.8° C.+18.5(log₁₀M)+0.58(%G+C)−11.8(%G+C)²−0.56(%f)−820/n,

where M is the molarity of the monovalent cations (usually Na+), (%G+C)is the percentage of guanosine (G) and cytosine (C) nucleotides, (% f)is the percentage of formamide and n is the number of nucleotide bases(i.e., length) of the hybrid. Id. Equations 1 and 2 are typicallyaccurate only for hybrid duplexes longer than about 100-200 nucleotides.Id.

The T_(m) of nucleic acid sequences shorter than 50 nucleotides can becalculated as follows:

T _(m)(° C.)=4(G+C)+2(A+T),

where A (adenine), C, T (thymine), and G are the numbers of thecorresponding nucleotides.

An example of stringent hybridization conditions for hybridization ofcomplementary nucleic acids which have more than 100 complementaryresidues on a filter in a Southern or northern blot is 50% formalin with1 mg of heparin at 42° C., with the hybridization being carried outovernight. An example of stringent wash conditions is a 0.2×SSC wash at65° C. for 15 minutes (see Sambrook, supra for a description of SSCbuffer). Often the high stringency wash is preceded by a low stringencywash to remove background probe signal. An example low stringency washis 2×SSC at 40° C. for 15 minutes.

In general, a signal to noise ratio of 2.5×-5× (or higher) than thatobserved for an unrelated probe in the particular hybridization assayindicates detection of a specific hybridization. Detection of at leaststringent hybridization between two sequences in the context of thepresent invention indicates relatively strong structural similarity orhomology to, e.g., the nucleic acids of the present invention providedin the sequence listings herein.

As noted, “highly stringent” conditions are selected to be about 5° C.or less lower than the thermal melting point (T_(m)) for the specificsequence at a defined ionic strength and pH. Target sequences that areclosely related or identical to the nucleotide sequence of interest(e.g., “probes”) can be identified under highly stringent conditions.Lower stringency conditions are appropriate for sequences that are lesscomplementary. See, e.g., Rapley and Walker, supra.

Comparative hybridization can be used to identify nucleic acids of theinvention, and this comparative hybridization method is a preferredmethod of distinguishing nucleic acids of the invention. Detection ofhighly stringent hybridization between two nucleotide sequences in thecontext of the present invention indicates relatively strong structuralsimilarity/homology to, e.g., the nucleic acids provided in the sequencelisting herein. Highly stringent hybridization between two nucleotidesequences demonstrates a degree of similarity or homology of structure,nucleotide base composition, arrangement or order that is greater thanthat detected by stringent hybridization conditions. In particular,detection of highly stringent hybridization in the context of thepresent invention indicates strong structural similarity or structuralhomology (e.g., nucleotide structure, base composition, arrangement ororder) to, e.g., the nucleic acids provided in the sequence listingsherein. For example, it is desirable to identify test nucleic acids thathybridize to the exemplar nucleic acids herein under stringentconditions.

Thus, one measure of stringent hybridization is the ability to hybridizeto one of the listed nucleic acids (e.g., nucleic acid sequences SEQ IDNO: 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528,529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542,543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556,557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 620, 622, 624,626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652,654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680,682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 704, 706, 708,710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736,738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764,768, 770, 772, 774, 776, 778, 780, 782, 784, 786, 788, 790, 792, 794,796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822,824, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856,858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884,886, 888, 890, 892, 894, 896, 898, 900, 902, 904, 906, 908, 910, 912,914, 916, 918, 920, 922, 924, 926, 928, 930, 932, 933, 934, 935, 936,937, 938, 939, 940, 941, 942, 943, 944, 945, 947, 949, 951, and 952, andcomplementary polynucleotide sequences thereof), under highly stringentconditions (or very stringent conditions, or ultra-high stringencyhybridization conditions, or ultra-ultra high stringency hybridizationconditions). Stringent hybridization (as well as highly stringent,ultra-high stringency, or ultra-ultra high stringency hybridizationconditions) and wash conditions can easily be determined empirically forany test nucleic acid. For example, in determining highly stringenthybridization and wash conditions, the hybridization and wash conditionsare gradually increased (e.g., by increasing temperature, decreasingsalt concentration, increasing detergent concentration and/or increasingthe concentration of organic solvents, such as formalin, in thehybridization or wash), until a selected set of criteria are met. Forexample, the hybridization and wash conditions are gradually increaseduntil a probe comprising one or more nucleic acid sequences selectedfrom SEQ ID NO: 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526,527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540,541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554,555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 620,622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648,650, 652, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676,678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 704,706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732,734, 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760,762, 764, 768, 770, 772, 774, 776, 778, 780, 782, 784, 786, 788, 790,792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818,820, 822, 824, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852,854, 856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880,882, 884, 886, 888, 890, 892, 894, 896, 898, 900, 902, 904, 906, 908,910, 912, 914, 916, 918, 920, 922, 924, 926, 928, 930, 932, 933, 934,935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 947, 949, 951,and 952, and complementary polynucleotide sequences thereof, binds to aperfectly matched complementary target (again, a nucleic acid comprisingone or more nucleic acid sequences selected from SEQ ID NO: 516, 517,518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531,532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545,546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559,560, 561, 562, 563, 564, 565, 566, 567, 620, 622, 624, 626, 628, 630,632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658,660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686,688, 690, 692, 694, 696, 698, 700, 702, 704, 706, 708, 710, 712, 714,716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742,744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 768, 770, 772,774, 776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800,802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 824, 832, 834,836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862,864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890,892, 894, 896, 898, 900, 902, 904, 906, 908, 910, 912, 914, 916, 918,920, 922, 924, 926, 928, 930, 932, 933, 934, 935, 936, 937, 938, 939,940, 941, 942, 943, 944, 945, 947, 949, 951, and 952, and complementarypolynucleotide sequences thereof), with a signal to noise ratio that isat least about 2.5×, and optionally about 5× or more as high as thatobserved for hybridization of the probe to an unmatched target. In thiscase, the unmatched target is a nucleic acid corresponding to a nucleicacid (other than those in the accompanying sequence listing) that ispresent in a public database such as GenBank™ at the time of filing ofthe subject application. Such sequences can be identified in GenBank byone of skill. Examples include Accession Nos. Z99109 and Y09476.Additional such sequences can be identified in e.g., GenBank, by one ofordinary skill in the art.

A test nucleic acid is said to specifically hybridize to a probe nucleicacid when it hybridizes at least ½ as well to the probe as to theperfectly matched complementary target, i.e., with a signal to noiseratio at least ½ as high as hybridization of the probe to the targetunder conditions in which the perfectly matched probe binds to theperfectly matched complementary target with a signal to noise ratio thatis at least about 2×-10×, and occasionally 20×, 50× or greater than thatobserved for hybridization to any of the unmatched polynucleotides ofAccession Nos. Z99109 and Y09476.

Ultra high-stringency hybridization and wash conditions are those inwhich the stringency of hybridization and wash conditions are increaseduntil the signal to noise ratio for binding of the probe to theperfectly matched complementary target nucleic acid is at least 10× ashigh as that observed for hybridization to any of the unmatched targetnucleic acids of Genbank Accession numbers Z99109 and Y09476. A targetnucleic acid which hybridizes to a probe under such conditions, with asignal to noise ratio of at least ½ that of the perfectly matchedcomplementary target nucleic acid is said to bind to the probe underultra-high stringency conditions.

Similarly, even higher levels of stringency can be determined bygradually increasing the hybridization and/or wash conditions of therelevant hybridization assay. For example, those in which the stringencyof hybridization and wash conditions are increased until the signal tonoise ratio for binding of the probe to the perfectly matchedcomplementary target nucleic acid is at least 10×, 20×, 50×, 100×, or500× or more as high as that observed for hybridization to any of theunmatched target nucleic acids of Genbank Accession numbers Z99109 andY09476. A target nucleic acid which hybridizes to a probe under suchconditions, with a signal to noise ratio of at least ½ that of theperfectly matched complementary target nucleic acid is said to bind tothe probe under ultra-ultra-high stringency conditions.

Target nucleic acids which hybridize to the nucleic acids represented bySEQ ID NO: 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527,528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541,542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555,556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 620, 622,624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650,652, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678,680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 704, 706,708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734,736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762,764, 768, 770, 772, 774, 776, 778, 780, 782, 784, 786, 788, 790, 792,794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820,822, 824, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854,856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882,884, 886, 888, 890, 892, 894, 896, 898, 900, 902, 904, 906, 908, 910,912, 914, 916, 918, 920, 922, 924, 926, 928, 930, 932, 933, 934, 935,936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 947, 949, 951, and 952under high, ultra-high and ultra-ultra high stringency conditions are afeature of the invention. Examples of such nucleic acids include thosewith one or a few silent or conservative nucleic acid substitutions ascompared to a given nucleic acid sequence.

Nucleic acids which do not hybridize to each other under stringentconditions are still substantially identical if the polypeptides whichthey encode are substantially identical. This occurs, e.g., when a copyof a nucleic acid is created using the maximum codon degeneracypermitted by the genetic code, or when antisera or antiserum generatedagainst one or more of SEQ ID NO: 568, 569, 570, 571, 572, 573, 574,575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588,589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602,603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616,617, 618, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641,643, 645, 647, 649, 651, 653, 655, 657, 659, 661, 663, 665, 667, 669,671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 695, 697,699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725,727, 729, 731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753,755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 775, 777, 779, 781,783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809,811, 813, 815, 817, 819, 821, 823, 825, 833, 835, 837, 839, 841, 843,845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871,873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893, 895, 897, 899,901, 903, 905, 907, 909, 911, 913, 915, 917, 919, 921, 923, 925, 927,929, 931, 946, 948, 950, 953, 954, 955, 956, 957, 958, 959, 960, 961,962, 963, 964, 965, 966, 967, 968, 969, 970, 971, and 972, which hasbeen subtracted using the polypeptides encoded by known nucleotidesequences, including those of Genbank Accession number CAA70664. Furtherdetails on immunological identification of polypeptides of the inventionare found below. Additionally, for distinguishing between duplexes withsequences of less than about 100 nucleotides, a TMAC1 hybridizationprocedure known to those of ordinary skill in the art can be used. See,e.g., Sorg, U. et al. Nucleic Acids Res. (Sep. 11, 1991) 19(17),incorporated herein by reference in its entirety for all purposes.

In one aspect, the invention provides a nucleic acid which comprises aunique subsequence in a nucleic acid selected from SEQ ID NO: 516, 517,518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531,532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545,546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559,560, 561, 562, 563, 564, 565, 566, 567, 620, 622, 624, 626, 628, 630,632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658,660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686,688, 690, 692, 694, 696, 698, 700, 702, 704, 706, 708, 710, 712, 714,716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742,744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 768, 770, 772,774, 776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800,802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 824, 832, 834,836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862,864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890,892, 894, 896, 898, 900, 902, 904, 906, 908, 910, 912, 914, 916, 918,920, 922, 924, 926, 928, 930, 932, 933, 934, 935, 936, 937, 938, 939,940, 941, 942, 943, 944, 945, 947, 949, 951, and 952. The uniquesubsequence is unique as compared to a nucleic acid corresponding to anyof Genbank Accession numbers Z99109 and Y09476. Such unique subsequencescan be determined by aligning any of SEQ ID NO: 516, 517, 518, 519, 520,521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534,535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548,549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562,563, 564, 565, 566, 567, 620, 622, 624, 626, 628, 630, 632, 634, 636,638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 664,666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692,694, 696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720,722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748,750, 752, 754, 756, 758, 760, 762, 764, 768, 770, 772, 774, 776, 778,780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806,808, 810, 812, 814, 816, 818, 820, 822, 824, 832, 834, 836, 838, 840,842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 864, 866, 868,870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, 892, 894, 896,898, 900, 902, 904, 906, 908, 910, 912, 914, 916, 918, 920, 922, 924,926, 928, 930, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942,943, 944, 945, 947, 949, 951, and 952 against the complete set ofnucleic acids represented by GenBank accession numbers Z99109 and Y09476or other related sequences available in public databases as of thefiling date of the subject application. Alignment can be performed usingthe BLAST algorithm set to default parameters. Any unique subsequence isuseful, e.g., as a probe to identify the nucleic acids of the invention.

Similarly, the invention includes a polypeptide which comprises a uniquesubsequence in a polypeptide selected from: SEQ ID NO: 568, 569, 570,571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584,585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598,599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612,613, 614, 615, 616, 617, 618, 619, 621, 623, 625, 627, 629, 631, 633,635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657, 659, 661,663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689,691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717,719, 721, 723, 725, 727, 729, 731, 733, 735, 737, 739, 741, 743, 745,747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773,775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801,803, 805, 807, 809, 811, 813, 815, 817, 819, 821, 823, 825, 833, 835,837, 839, 841, 843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863,865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891,893, 895, 897, 899, 901, 903, 905, 907, 909, 911, 913, 915, 917, 919,921, 923, 925, 927, 929, 931, 946, 948, 950, 953, 954, 955, 956, 957,958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971,and 972. Here, the unique subsequence is unique as compared to apolypeptide corresponding to that of GenBank accession number CAA70664.Here again, the polypeptide is aligned against the sequences representedby accession number CAA70664. Note that if the sequence corresponds to anon-translated sequence such as a pseudo gene, the correspondingpolypeptide is generated simply by in silico translation of the nucleicacid sequence into an amino acid sequence, where the reading frame isselected to correspond to the reading frame of homologous GATpolynucleotides.

The invention also provides for target nucleic acids which hybridizeunder stringent conditions to a unique coding oligonucleotide whichencodes a unique subsequence in a polypeptide selected from SEQ ID NO:568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581,582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595,596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609,610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 621, 623, 625, 627,629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 655,657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683,685, 687, 689, 691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711,713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 735, 737, 739,741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767,769, 771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795,797, 799, 801, 803, 805, 807, 809, 811, 813, 815, 817, 819, 821, 823,825, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 855, 857,859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885,887, 889, 891, 893, 895, 897, 899, 901, 903, 905, 907, 909, 911, 913,915, 917, 919, 921, 923, 925, 927, 929, 931, 946, 948, 950, 953, 954,955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968,969, 970, 971, and 972 wherein the unique subsequence is unique ascompared to a polypeptide corresponding to any of the controlpolypeptides. Unique sequences are determined as noted above.

In one example, the stringent conditions are selected such that aperfectly complementary oligonucleotide to the coding oligonucleotidehybridizes to the coding oligonucleotide with at least about a 2.5×-10×higher, preferably at least about a 5-10× higher signal to noise ratiothan for hybridization of the perfectly complementary oligonucleotide toa control nucleic acid corresponding to any of the control polypeptides.Conditions can be selected such that higher ratios of signal to noiseare observed in the particular assay which is used, e.g., about 15×,20×, 30×, 50× or more. In this example, the target nucleic acidhybridizes to the unique coding oligonucleotide with at least a 2×higher signal to noise ratio as compared to hybridization of the controlnucleic acid to the coding oligonucleotide. Again, higher signal tonoise ratios can be selected, e.g., about 2.5×, 5×, 10×, 20×, 30×, 50×or more. The particular signal will depend on the label used in therelevant assay, e.g., a fluorescent label, a calorimetric label, aradioactive label, or the like.

Vectors, Promoters and Expression Systems,

The present invention also includes recombinant constructs comprisingone or more of the nucleic acid sequences as broadly described above.The constructs comprise a vector, such as, a plasmid, a cosmid, a phage,a virus, a bacterial artificial chromosome (BAC), a yeast artificialchromosome (YAC), or the like, into which a nucleic acid sequence of theinvention has been inserted, in a forward or reverse orientation. In apreferred aspect of this embodiment, the construct further comprisesregulatory sequences, including, for example, a promoter, operablylinked to the sequence. Large numbers of suitable vectors and promotersare known to those of skill in the art, and are commercially available.

As previously discussed, general texts which describe molecularbiological techniques useful herein, including the use of vectors,promoters and many other relevant topics, include Berger and Kimmel,Guide to Molecular Cloning Techniques, Methods in Enzymology Volume 152,(Academic Press, Inc., San Diego, Calif.) (“Berger”); Sambrook et al.,Molecular Cloning—A Laboratory Manual, 2d ed., Vol. 1-3, Cold SpringHarbor Laboratory, Cold Spring Harbor, New York, 1989 (“Sambrook”) andCurrent Protocols in Molecular Biology, F. M. Ausubel et al., eds.,Current Protocols, a joint venture between Greene Publishing Associates,Inc. and John Wiley & Sons, Inc., (supplemented through 1999)(“Ausubel”). Examples of protocols sufficient to direct persons of skillthrough in vitro amplification methods, including the polymerase chainreaction (PCR), the ligase chain reaction (LCR), Qβ-replicaseamplification and other RNA polymerase mediated techniques (e.g.,NASBA), e.g., for the production of the homologous nucleic acids of theinvention are found in Berger, Sambrook, and Ausubel, as well as inMullis et al. (1987) U.S. Pat. No. 4,683,202; Innis et al., eds. (1990)PCR Protocols: A Guide to Methods and Applications (Academic Press Inc.San Diego, Calif.) (“Innis”); Arnheim & Levinson (Oct. 1, 1990) C&EN36-47; The Journal Of NIH Research (1991) 3: 81-94; Kwoh et al. (1989)Proc. Natl. Acad. Sci. USA 86: 1173; Guatelli et al. (1990) Proc. Nat'l.Acad. Sci. USA 87: 1874; Lomell et al. (1989) J. Clin. Chem. 35: 1826;Landegren et al. (1988) Science 241: 1077-1080; Van Brunt (1990)Biotechnology 8: 291-294; Wu and Wallace (1989) Gene 4:560; Barringer etal. (1990) Gene 89:117; and Sooknanan and Malek (1995) Biotechnology 13:563-564. Improved methods for cloning in vitro amplified nucleic acidsare described in Wallace et al., U.S. Pat. No. 5,426,039. Improvedmethods for amplifying large nucleic acids by PCR are summarized inCheng et al. (1994) Nature 369: 684-685 and the references citedtherein, in which PCR amplicons of up to 40 kb are generated. One ofskill will appreciate that essentially any RNA can be converted into adouble stranded DNA suitable for restriction digestion, PCR expansionand sequencing using reverse transcriptase and a polymerase. See, e.g.,Ausubel, Sambrook and Berger, all supra.

The present invention also relates to engineered host cells that aretransduced (transformed or transfected) with a vector of the invention(e.g., an invention cloning vector or an invention expression vector),as well as the production of polypeptides of the invention byrecombinant techniques. The vector may be, for example, a plasmid, aviral particle, a phage, etc. The engineered host cells can be culturedin conventional nutrient media modified as appropriate for activatingpromoters, selecting transformants, or amplifying the GAT homologuegene. Culture conditions, such as temperature, pH and the like, arethose previously used with the host cell selected for expression, andwill be apparent to those skilled in the art and in the references citedherein, including, e.g., Sambrook, Ausubel and Berger, as well as e.g.,Freshney (1994) Culture of Animal Cells: A Manual of Basic Technique,3^(rd) ed. (Wiley-Liss, New York) and the references cited therein.

GAT polypeptides of the invention can be produced in non-animal cellssuch as plants, yeast, fungi, bacteria and the like. In addition toSambrook, Berger and Ausubel, details regarding non-animal cell culturecan be found in Payne et al. (1992) Plant Cell and Tissue Culture inLiquid Systems (John Wiley & Sons, Inc. New York, N.Y.); Gamborg andPhillips, eds. (1995) Plant Cell, Tissue and Organ Culture: FundamentalMethods/Springer Lab Manual (Springer-Verlag, Berlin); and Atlas andParks, eds., The Handbook of Microbiological Media (1993) CRC Press,Boca Raton, Fla.

Polynucleotides of the present invention can be incorporated into anyone of a variety of expression vectors suitable for expressing apolypeptide. Suitable vectors include chromosomal, nonchromosomal andsynthetic DNA sequences, e.g., derivatives of SV40; bacterial plasmids;phage DNA; baculovirus; yeast plasmids; vectors derived fromcombinations of plasmids and phage DNA, viral DNA such as vaccinia,adenovirus, fowl pox virus, pseudorabies, adenovirus, adeno-associatedviruses, retroviruses and many others. Any vector that transducesgenetic material into a cell, and, if replication is desired, which isreplicable and viable in the relevant host can be used.

When incorporated into an expression vector, a polynucleotide of theinvention is operatively linked to an appropriate transcription controlsequence (promoter) to direct mRNA synthesis. Examples of suchtranscription control sequences particularly suited for use intransgenic plants include the cauliflower mosaic virus (CaMV), figwortmosaic virus (FMV) and strawberry vein banding virus (SVBV) promoters,described in U.S. Provisional Application No. 60/245,354. Otherpromoters known to control expression of genes in prokaryotic oreukaryotic cells or their viruses and which can be used in someembodiments of the invention include SV40 promoter, E. coli lac or trppromoter, and the phage lambda P_(L) promoter. An expression vectoroptionally contains a ribosome binding site for translation initiation,and a transcription terminator, such as PinII. The vector alsooptionally includes appropriate sequences for amplifying expression,e.g., an enhancer.

In addition, the expression vectors of the present invention optionallycontain one or more selectable marker genes to provide a phenotypictrait for selection of transformed host cells. Usually, the selectablemarker gene will encode antibiotic or herbicide resistance. Suitablegenes include those coding for resistance to the antibioticspectinomycin or streptomycin (e.g., the aada gene), the streptomycinphosphotransferase (SPT) gene coding for streptomycin resistance, theneomycin phosphotransferase (NPTII) gene encoding kanamycin or geneticinresistance, the hygromycin phosphotransferase (HPT) gene coding forhygromycin resistance. Additional selectable marker genes includedihydrofolate reductase or neomycin resistance for eukaryotic cellculture, and tetracycline or ampicillin resistance in E. coli.

Suitable genes coding for resistance to herbicides include those whichact to inhibit the action of acetolactate synthase (ALS), in particularthe sulfonylurea-type herbicides (e.g., the acetolactate synthase (ALS)gene containing mutations leading to such resistance in particular theS4 and/or Hra mutations), those which act to inhibit the action ofglutamine synthase, such as phosphinothricin or basta (e.g., the bargene), or other such genes known in the art. The bar gene encodesresistance to the herbicide basta and the ALS gene encodes resistance tothe herbicide chlorsulfuron. In some instances, the modified GAT genesare used as selectable markers.

Vectors of the present invention can be employed to transform anappropriate host to permit the host to express an inventive protein orpolypeptide. Examples of appropriate expression hosts include: bacterialcells, such as E. coli, B. subtilis, Streptomyces, and Salmonellatyphimurium; fungal cells, such as Saccharomyces cerevisiae, Pichiapastoris, and Neurospora crassa; insect cells such as Drosophila andSpodoptera frugiperda; mammalian cells such as CHO, COS, BHK, HEK 293 orBowes melanoma; or plant cells or explants, etc. It is understood thatnot all cells or cell lines need to be capable of producing fullyfunctional GAT polypeptides; for example, antigenic fragments of a GATpolypeptide may be produced. The present invention is not limited by thehost cells employed.

In bacterial systems, a number of expression vectors may be selecteddepending upon the use intended for the GAT polypeptide. For example,when large quantities of GAT polypeptide or fragments thereof are neededfor commercial production or for induction of antibodies, vectors whichdirect high level expression of fusion proteins that are readilypurified can be desirable. Such vectors include, but are not limited to,multifunctional E. coli cloning and expression vectors such asBLUESCRIPT (Stratagene), in which the GAT polypeptide coding sequencemay be ligated into the vector in-frame with sequences for theamino-terminal Met and the subsequent 7 residues of beta-galactosidaseso that a hybrid protein is produced; pIN vectors (Van Heeke & Schuster(1989) J. Biol. Chem. 264: 5503-5509); pET vectors (Novagen, MadisonWis.); and the like.

Similarly, in the yeast Saccharomyces cerevisiae a number of vectorscontaining constitutive or inducible promoters such as alpha factor,alcohol oxidase and PGH may be used for production of the GATpolypeptides of the invention. For reviews, see Ausubel (supra) andGrant et al. (1987) Methods in Enzymology 153:516-544.

In mammalian host cells, a variety of expression systems, includingviral-based systems, may be utilized. In cases where an adenovirus isused as an expression vector, a coding sequence, e.g., of a GATpolypeptide, is optionally ligated into an adenovirustranscription/translation complex consisting of the late promoter andtripartite leader sequence. Insertion of a GAT polypeptide coding regioninto a nonessential E1 or E3 region of the viral genome will result in aviable virus capable of expressing a GAT in infected host cells (Loganand Shenk (1984) Proc. Nat'l. Acad. Sci. USA 81:3655-3659). In addition,transcription enhancers, such as the rous sarcoma virus (RSV) enhancer,may be used to increase expression in mammalian host cells.

Similarly, in plant cells, expression can be driven from a transgeneintegrated into a plant chromosome, or cytoplasmically from an episomalor viral nucleic acid. In the case of stably integrated transgenes, itis often desirable to provide sequences capable of driving constitutiveor inducible expression of the GAT polynucleotides of the invention, forexample, using viral, e.g., CaMV, or plant derived regulatory sequences.Numerous plant derived regulatory sequences have been described,including sequences which direct expression in a tissue specific manner,e.g., TobRB7, patatin B33, GRP gene promoters, the rbcS-3A promoter, andthe like. Alternatively, high level expression can be achieved bytransiently expressing exogenous sequences of a plant viral vector,e.g., TMV, BMV, etc. Typically, transgenic plants constitutivelyexpressing a GAT polynucleotide of the invention will be preferred, andthe regulatory sequences are selected to insure constitutive stableexpression of the GAT polypeptide.

Typical vectors useful for expression of nucleic acids in higher plantsare well known in the art and include vectors derived from thetumor-inducing (Ti) plasmid of Agrobacterium tumefaciens described byRogers et al. (1987) Meth. Enzymol. 153: 253-277. Exemplary A.tumefaciens vectors useful herein are plasmids pKYLX6 and pKYLX7 ofSchardl et al. (1987) Gene 61:1-11 and Berger et al. (1989) Proc. Nat'l.Acad. Sci. U.S.A. 86: 8402-8406. Another useful vector herein is plasmidpBI101.2 that is available from Clontech Laboratories, Inc. (Palo Alto,Calif.). A variety of plant viruses that can be employed as vectors areknown in the art and include cauliflower mosaic virus (CaMV),geminivirus, brome mosaic virus, and tobacco mosaic virus.

In some embodiments of the present invention, a GAT polynucleotideconstruct suitable for transformation of plant cells is prepared. Forexample, a desired GAT polynucleotide can be incorporated into arecombinant expression cassette to facilitate introduction of the geneinto a plant and subsequent expression of the encoded polypeptide. Anexpression cassette will typically comprise a GAT polynucleotide, orfunctional fragment thereof, operably linked to a promoter sequence andother transcriptional and translational initiation regulatory sequenceswhich will direct expression of the sequence in the intended tissues(e.g., entire plant, leaves, seeds) of the transformed plant.

For example, a strongly or weakly constitutive plant promoter can beemployed which will direct expression of the GAT polypeptide in alltissues of a plant. Such promoters are active under most environmentalconditions and states of development or cell differentiation. Examplesof constitutive promoters include the cauliflower mosaic virus (CaMV)35S transcription initiation region, the 1′- or 2′-promoter derived fromT-DNA of Agrobacterium tumefaciens, the ubiquitin 1 promoter, the Smaspromoter, the cinnamyl alcohol dehydrogenase promoter (U.S. Pat. No.5,683,439), the Nos promoter, the pEmu promoter, the rubisco promoter,the GRP1-8 promoter and other transcription initiation regions fromvarious plant genes known to those of skill. In situations in which overexpression of a GAT polynucleotide is detrimental to the plant orotherwise undesirable, one of skill, upon review of this disclosure,will recognize that weak constitutive promoters can be used forlow-levels of expression. In those cases where high levels of expressionis not harmful to the plant, a strong promoter, e.g., a t-RNA or otherpol III promoter, or a strong pol II promoter, such as the cauliflowermosaic virus promoter, can be used.

Alternatively, a plant promoter may be under environmental control. Suchpromoters are referred to here as “inducible” promoters. Examples ofenvironmental conditions that may effect transcription by induciblepromoters include pathogen attack, anaerobic conditions, or the presenceof light. In particular, examples of inducible promoters are the Adh1promoter which is inducible by hypoxia or cold stress, the Hsp70promoter which is inducible by heat stress, and the PPDK promoter whichis inducible by light. Also useful are promoters which are chemicallyinducible.

The promoters used in the present invention can be “tissue-specific”and, as such, under developmental control in that the polynucleotide isexpressed only in certain tissues, such as leaves, roots, fruit, flowersand/or seeds. An exemplary promoter is the anther specific promoter 5126(U.S. Pat. Nos. 5,689,049 and 5,689,051). Examples of seed-preferredpromoters include, but are not limited to, 27 kD gamma zein promoter andwaxy promoter, Boronat et al. 1986) Plant Sci. 47, 95-102; Reina et al.(1990) Nucleic Acids Res. 18 (21): 6426; and Kloesgen et al. (1986) Mol.Gen. Genet. 203: 237-244. Promoters that express in the embryo,pericarp, and endosperm are disclosed in U.S. Patent Application Ser.Nos. 60/097,233 filed Aug. 20, 1998 and 60/098,230 filed Aug. 28, 1998.The disclosures each of these are incorporated herein by reference intheir entirety. In embodiments in which one or more nucleic acidsequences endogenous to the plant system are incorporated into theconstruct, the endogenous promoters (or variants thereof) from thesegenes can be employed for directing expression of the genes in thetransfected plant. Tissue-specific promoters can also be used to directexpression of heterologous polynucleotides.

In general, the particular promoter used in the expression cassette inplants depends on the intended application. Either heterologous ornon-heterologous (i.e., endogenous) promoters can be employed to directexpression of the nucleic acids of the present invention. Thesepromoters can also be used, for example, in expression cassettes todrive expression of antisense nucleic acids to reduce, increase, oralter the concentration and/or composition of the proteins of thepresent invention in a desired tissue. Any of a number of promoterswhich direct transcription in plant cells are suitable. The promoter canbe either constitutive or inducible. In addition to the promoters notedabove, promoters of bacterial origin which operate in plants include theoctopine synthase promoter, the nopaline synthase promoter and otherpromoters derived from native Ti plasmids (see, Herrara-Estrella et al.(1983) Nature 303: 209-213). Viral promoters include the 35S and 19S RNApromoters of cauliflower mosaic virus (Odell et al. (1985) Nature 313:810-812). Other plant promoters include the ribulose-1,3-bisphosphatecarboxylase small subunit promoter and the phaseolin promoter. Thepromoter sequence from the E8 gene and other genes may also be used. Theisolation and sequence of the E8 promoter is described in detail inDeikman and Fischer (1988) EMBO J. 7: 3315-3327.

To identify candidate promoters, the 5′ portions of a genomic clone isanalyzed for sequences characteristic of promoter sequences. Forinstance, promoter sequence elements include the TATA box consensussequence (TATAAT), which is usually 20 to 30 base pairs upstream of thetranscription start site. In plants, further upstream from the TATA box,at positions −80 to −100, there is typically a promoter element with aseries of adenines surrounding the trinucleotide G (or T) as describedby Messing et al. (1983) Genetic Engineering in Plants, eds. Kosage, etal., pp. 221-227.

In preparing polynucleotide constructs, e.g., vectors, of the invention,sequences other than the promoter and the cojoined polynucleotide canalso be employed. If normal polypeptide expression is desired, apolyadenylation region at the 3′-end of a GAT-encoding region can beincluded. The polyadenylation region can be derived, for example, from avariety of plant genes, or from T-DNA. The 3′ end sequence to be addedcan be derived from, for example, the nopaline synthase or octopinesynthase genes, or alternatively from another plant gene, or lesspreferably from any other eukaryotic gene.

An intron sequence can be added to the 5′ untranslated region of thecoding sequence or the partial coding sequence to increase the amount ofthe mature message that accumulates. See for example Buchman and Berg(1988) Mol. Cell. Biol. 8: 4395-4405 and Callis et al. (1987) Genes Dev.1: 1183-1200. Use of maize introns Adh1, intron 1, 2, and 6, and theBronze-1 intron are known in the art. See generally, Freeling andWalbot, eds. (1994) The Maize Handbook (Springer, New York), chapter116.

The construct can also include a marker gene which confers a selectablephenotype on plant cells. For example, the marker may encode biocidetolerance, particularly antibiotic tolerance, such as tolerance tokanamycin, G418, bleomycin, hygromycin, or herbicide tolerance, such astolerance to chlorsulfuron, or phosphinothricin (the active ingredientin the herbicides bialaphos and Basta).

Specific initiation signals can aid in efficient translation of a GATpolynucleotide-encoding sequence of the present invention. These signalscan include, e.g., the ATG initiation codon and adjacent sequences. Incases where a GAT polypeptide-encoding sequence, its initiation codonand upstream sequences are inserted into an appropriate expressionvector, no additional translational control signals may be needed.However, in cases where only the coding sequence (e.g., a mature proteincoding sequence), or a portion thereof, is inserted, exogenoustranscriptional control signals including the initiation codon must beprovided. Furthermore, the initiation codon must be in the correctreading frame to ensure transcription of the entire insert. Exogenoustranscriptional elements and initiation codons can be of variousorigins, both natural and synthetic. The efficiency of expression may beenhanced by the inclusion of enhancers appropriate to the cell system inuse (Scharf et al. (1994) Results Probl. Cell Differ. 20: 125-62 andBittner et al. (1987) Methods in Enzymol 153: 516-544).

Secretion/Localization Sequences

Polynucleotides of the invention can also be fused, for example,in-frame to nucleic acids encoding a secretion/localization sequence, totarget polypeptide expression to a desired cellular compartment,membrane, or organelle of a host cell, or to direct polypeptidesecretion to the periplasmic space or into the cell culture media. Suchsequences are known to those of skill, and include secretion leaderpeptides, organelle targeting sequences (e.g., nuclear localizationsequences, ER retention signals, mitochondrial transit sequences, andchloroplast transit sequences), membrane localization/anchor sequences(e.g., stop transfer sequences, GPI anchor sequences), and the like.

In a preferred embodiment, a polynucleotide of the invention is fused inframe with an N-terminal chloroplast transit sequence (or chloroplasttransit peptide sequence) derived from a gene encoding a polypeptidethat is normally targeted to the chloroplast. Such sequences aretypically rich in serine and threonine; are deficient in aspartate,glutamate, and tyrosine; and generally have a central domain rich inpositively charged amino acids.

Expression Hosts

In a further embodiment, the present invention relates to host cellscontaining the above-described constructs. The host cell can be aeukaryotic cell, such as a mammalian cell, a yeast cell, or a plantcell, or the host cell can be a prokaryotic cell, such as a bacterialcell. Introduction of the construct into the host cell can be effectedby calcium phosphate transfection, DEAE-Dextran mediated transfection,electroporation, or other common techniques (Davis et al., Basic Methodsin Molecular Biology).

A host cell is optionally chosen for its ability to modulate theexpression of the inserted sequences or to process the expressed proteinin the desired fashion. Such modifications of the protein include, butare not limited to, acetylation, carboxylation, glycosylation,phosphorylation, lipidation and acylation. Post-translational processingthat cleaves a “pre” or a “prepro” form of the protein may also beimportant for correct insertion, folding and/or function. Different hostcells such as E. coli, Bacillus sp., yeast or mammalian cells such asCHO, HeLa, BHK, MDCK, 293, WI38, etc. have specific cellular machineryand characteristic mechanisms, e.g., for post-translational activitiesand may be chosen to ensure the desired modification and processing ofthe introduced, foreign protein.

For long-term, high-yield production of recombinant proteins, stableexpression systems can be used. For example, plant cells, explants ortissues, e.g. shoots, or leaf discs, which stably express a polypeptideof the invention are transduced using expression vectors which containviral origins of replication or endogenous expression elements and aselectable marker gene. Following the introduction of the vector, cellsmay be allowed to grow for a period determined to be appropriate for thecell type, e.g., 1 or more hours for bacterial cells, 1-4 days for plantcells, 2-4 weeks for some plant explants, in an enriched media beforethey are switched to selective media. The purpose of the selectablemarker is to confer resistance to selection, and its presence allowsgrowth and recovery of cells which successfully express the introducedsequences. For example, transgenic plants expressing the polypeptides ofthe invention can be selected directly for resistance to the herbicide,glyphosate. Resistant embryos derived from stably transformed explantscan be proliferated, e.g., using tissue culture techniques appropriateto the cell type.

Host cells transformed with a nucleotide sequence encoding a polypeptideof the invention are optionally cultured under conditions suitable forthe expression and recovery of the encoded protein from cell culture.The protein or fragment thereof produced by a recombinant cell may besecreted, membrane-bound, or contained intracellularly, depending on thesequence and/or the vector used. As will be understood by those of skillin the art, expression vectors containing GAT polynucleotides of theinvention can be designed with signal sequences which direct secretionof the mature polypeptides through a prokaryotic or eukaryotic cellmembrane.

Additional Polypeptide Sequences

Polynucleotides of the present invention may also comprise a codingsequence fused in-frame to a marker sequence that, e.g., facilitatespurification of the encoded polypeptide. Such purification facilitatingdomains include, but are not limited to, metal chelating peptides suchas histidine-tryptophan modules that allow purification on immobilizedmetals, a sequence which binds glutathione (e.g., GST), a hemagglutinin(HA) tag (corresponding to an epitope derived from the influenzahemagglutinin protein; Wilson et al. (1984) Cell 37: 767), maltosebinding protein sequences, the FLAG epitope utilized in the FLAGSextension/affinity purification system (Immunex Corp, Seattle, Wash.),and the like. The inclusion of a protease-cleavable polypeptide linkersequence between the purification domain and the GAT homologue sequenceis useful to facilitate purification. One expression vector contemplatedfor use in the compositions and methods described herein provides forexpression of a fusion protein comprising a polypeptide of the inventionfused to a polyhistidine region separated by an enterokinase cleavagesite. The histidine residues facilitate purification on IMIAC(immobilized metal ion affinity chromatography, as described in Porathet al. (1992) Protein Expression and Purification 3: 263-281) while theenterokinase cleavage site provides a means for separating the GAThomologue polypeptide from the fusion protein. pGEX vectors (Promega;Madison, Wis.) may also be used to express foreign polypeptides asfusion proteins with glutathione S-transferase (GST). In general, suchfusion proteins are soluble and can easily be purified from lysed cellsby adsorption to ligand-agarose beads (e.g., glutathione-agarose in thecase of GST-fusions) followed by elution in the presence of free ligand.

Polypeptide Production and Recovery

Following transduction of a suitable host and growth of the host cellsto an appropriate cell density, the selected promoter is induced byappropriate means (e.g., temperature shift or chemical induction) andcells are cultured for an additional period. Cells are typicallyharvested by centrifugation, disrupted by physical or chemical means,and the resulting crude extract retained for further purification.Microbial cells employed in the expression of proteins can be disruptedby any convenient method, including freeze-thaw cycling, sonication,mechanical disruption, or use of cell lysing agents, or other methods,which are well known to those skilled in the art.

As noted, many references are available for the culture and productionof many cells, including cells of bacterial, plant, animal (especiallymammalian) and archebacterial origin. See e.g., Sambrook, Ausubel, andBerger (all supra), as well as Freshney (1994) Culture of Animal Cells:A Manual of Basic Technique, 3^(rd) ed. (Wiley-Liss, New York) and thereferences cited therein; Doyle and Griffiths (1997) Mammalian CellCulture Essential Techniques (John Wiley and Sons, NY); Humason (1979)Animal Tissue Techniques, 4^(th) ed. (W.H. Freeman and Company); andRicciardelli, et al. (1989) In vitro Cell Dev. Biol. 25: 1016-1024. Forplant cell culture and regeneration see, Payne et al. (1992) Plant Celland Tissue Culture in Liquid Systems (John Wiley & Sons, Inc., New York,N.Y.); Gamborg and Phillips, eds. (1995) Plant Cell, Tissue and OrganCulture: Fundamental Methods/Springer Lab Manual (Springer-Verlag,Berlin); Jones, ed. (1984) Plant Gene Transfer and Expression Protocols(Humana Press, Totowa, N.J.); and Croy, ed. (1993) Plant MolecularBiology (Bios Scientific Publishers, Oxford, U.K.), ISBN 0 12 198370 6.Cell culture media in general are set forth in Atlas and Parks, eds.(1993) The Handbook of Microbiological Media (CRC Press, Boca Raton,Fla.). Additional information for cell culture is found in availablecommercial literature such as the Life Science Research Cell CultureCatalogue (1998) from Sigma-Aldrich, Inc. (St Louis, Mo.)(“Sigma-LSRCCC”) and, e.g., The Plant Culture Catalogue and supplement(1997) also from Sigma-Aldrich, Inc. (St Louis, Mo.) (“Sigma-PCCS”).Further details regarding plant cell transformation and transgenic plantproduction are found below.

Polypeptides of the invention can be recovered and purified fromrecombinant cell cultures by any of a number of methods well known inthe art, including ammonium sulfate or ethanol precipitation, acidextraction, anion or cation exchange chromatography, phosphocellulosechromatography, hydrophobic interaction chromatography, affinitychromatography (e.g., using any of the tagging systems noted herein),hydroxylapatite chromatography, and lectin chromatography. Proteinrefolding steps can be used, as desired, in completing the configurationof the mature protein. Finally, high performance liquid chromatography(HPLC) can be employed in the final purification steps. In addition tothe references noted supra, a variety of purification methods are wellknown in the art, including, e.g., those set forth in Sandana (1997)Bioseparation of Proteins (Academic Press, Inc.; Bollag et al. (1996)Protein Methods, 2^(nd) ed. (Wiley-Liss, NY); Walker (1996) The ProteinProtocols Handbook (Humana Press, NJ), Harris and Angal (1990) ProteinPurification Applications: A Practical Approach (IRL Press at Oxford,Oxford, England); Harris and Angal Protein Purification Methods: APractical Approach (IRL Press at Oxford, Oxford, England); Scopes (1993)Protein Purification Principles and Practice, 3^(rd) ed. (SpringerVerlag, NY); Janson and Ryden (1998) Protein Purification: Principles,High Resolution Methods and Applications, 2^(nd) ed. (Wiley-VCH, NY);and Walker (1998) Protein Protocols on CD-ROM (Humana Press, NJ).

In some cases, it is desirable to produce the GAT polypeptide of theinvention in a large scale suitable for industrial and/or commercialapplications. In such cases bulk fermentation procedures are employed.Briefly, a GAT polynucleotide, e.g., a polynucleotide comprising any oneof SEQ ID NO: 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526,527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540,541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554,555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 620,622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648,650, 652, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676,678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 704,706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732,734, 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760,762, 764, 768, 770, 772, 774, 776, 778, 780, 782, 784, 786, 788, 790,792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818,820, 822, 824, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852,854, 856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880,882, 884, 886, 888, 890, 892, 894, 896, 898, 900, 902, 904, 906, 908,910, 912, 914, 916, 918, 920, 922, 924, 926, 928, 930, 932, 933, 934,935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 947, 949, 951,and 952, or other nucleic acids encoding GAT polypeptides of theinvention can be cloned into an expression vector. For example, U.S.Pat. No. 5,955,310 to Widner et al. “METHODS FOR PRODUCING A POLYPEPTIDEIN A BACILLUS CELL,” describes a vector with tandem promoters, andstabilizing sequences operably linked to a polypeptide encodingsequence. After inserting the polynucleotide of interest into a vector,the vector is transformed into a bacterial, e.g., a Bacillus subtilisstrain PL1801IIE (amyE, apr, npr, spoIIE::Tn917) host. The introductionof an expression vector into a Bacillus cell may, for instance, beeffected by protoplast transformation (see, e.g., Chang and Cohen (1979)Mol. Gen. Genet. 168:111), by using competent cells (see, e.g., Youngand Spizizin (1961) J. Bacteriol. 81:823, or Dubnau and Davidoff-Abelson(1971) J. Mol. Biol. 56: 209), by electroporation (see, e.g., Shigekawaand Dower (1988) Biotechniques 6: 742), or by conjugation (see, e.g.,Koehler and Thorne (1987) J. Bacteriol. 169: 5271), see also, Ausubel,Sambrook and Berger, all supra.

The transformed cells are cultivated in a nutrient medium suitable forproduction of the polypeptide using methods that are known in the art.For example, the cell may be cultivated by shake flask cultivation,small-scale or large-scale fermentation (including continuous, batch,fed-batch, or solid state fermentations) in laboratory or industrialfermentors performed in a suitable medium and under conditions allowingthe polypeptide to be expressed and/or isolated. The cultivation takesplace in a suitable nutrient medium comprising carbon and nitrogensources and inorganic salts, using procedures known in the art. Suitablemedia are available from commercial suppliers or may be preparedaccording to published compositions (e.g., in catalogues of the AmericanType Culture Collection). The secreted polypeptide can be recovereddirectly from the medium.

The resulting polypeptide may be isolated by methods known in the art.For example, the polypeptide may be isolated from the nutrient medium byconventional procedures including, but not limited to, centrifugation,filtration, extraction, spray-drying, evaporation, or precipitation. Theisolated polypeptide may then be further purified by a variety ofprocedures known in the art including, but not limited to,chromatography (e.g., ion exchange, affinity, hydrophobic,chromatofocusing, and size exclusion), electrophoretic procedures (e.g.,preparative isoelectric focusing), differential solubility (e.g.,ammonium sulfate precipitation), or extraction (see, e.g., Bollag et al.(1996) Protein Methods, 2^(nd) ed. (Wiley-Liss, NY) and Walker (1996)The Protein Protocols Handbook (Humana Press, NJ).

Cell-free transcription/translation systems can also be employed toproduce polypeptides using DNAs or RNAs of the present invention.Several such systems are commercially available. A general guide to invitro transcription and translation protocols is found in Tymms (1995)In vitro Transcription and Translation Protocols: Methods in MolecularBiology (Garland Publishing, NY), vol. 37.

Substrates and Formats for Sequence Recombination

The polynucleotides of the invention are optionally used as substratesfor a variety of diversity generating procedures, e.g., mutation,recombination and recursive recombination reactions, in addition totheir use in standard cloning methods as set forth in, e.g., Ausubel,Berger and Sambrook, to produce additional GAT polynucleotides andpolypeptides with desired properties. A variety of diversity generatingprotocols are available and described in the art. The procedures can beused separately, and/or in combination to produce one or more variantsof a polynucleotide or set of polynucleotides, as well variants ofencoded proteins. Individually and collectively, these proceduresprovide robust, widely applicable ways of generating diversifiedpolynucleotides and sets of polynucleotides (including, e.g.,polynucleotide libraries) useful, e.g., for the engineering or rapidevolution of polynucleotides, proteins, pathways, cells and/or organismswith new and/or improved characteristics. The process of altering thesequence can result in, for example, single nucleotide substitutions,multiple nucleotide substitutions, and insertion or deletion of regionsof the nucleic acid sequence.

While distinctions and classifications are made in the course of theensuing discussion for clarity, it will be appreciated that thetechniques are often not mutually exclusive. Indeed, the various methodscan be used singly or in combination, in parallel or in series, toaccess diverse sequence variants.

The result of any of the diversity generating procedures describedherein can be the generation of one or more polynucleotides, which canbe selected or screened for polynucleotides that encode proteins with orwhich confer desirable properties. Following diversification by one ormore of the methods described herein, or otherwise available to one ofskill, any polynucleotides that are produced can be selected for adesired activity or property, e.g. altered K_(m) for glyphosate, alteredK_(m) for acetyl CoA, use of alternative cofactors (e.g., propionyl CoA)increased k_(cat), etc. This can include identifying any activity thatcan be detected, for example, in an automated or automatable format, byany of the assays in the art. For example, GAT homologs with increasedspecific activity can be detected by assaying the conversion ofglyphosate to N-acetylglyphosate, e.g., by mass spectrometry.Alternatively, improved ability to confer resistance to glyphosate canbe assayed by growing bacteria transformed with a nucleic acid of theinvention on agar containing increasing concentrations of glyphosate orby spraying transgenic plants incorporating a nucleic acid of theinvention with glyphosate. A variety of related (or even unrelated)properties can be evaluated, in serial or in parallel, at the discretionof the practitioner. Additional details regarding recombination andselection for herbicide tolerance can be found, e.g., in “DNA SHUFFLINGTO PRODUCE HERBICIDE RESISTANT CROPS” (U.S. Pub. No. 2002/0058249) filedAug. 12, 1999.

Descriptions of a variety of diversity generating procedures, includingmultigene shuffling and methods for generating modified nucleic acidsequences encoding multiple enzymatic domains, are found the followingpublications and the references cited therein: Soong, N. et al. (2000)Nat. Genet. 25(4): 436-39; Stemmer, et al. (1999) Tumor Targeting 4:1-4; Ness et al. (1999) Nature Biotech. 17:893-896; Chang et al. (1999)Nature Biotech. 17: 793-797; Minshull and Stemmer (1999) Current Opinionin Chemical Biology 3: 284-290; Christians et al. (1999) Nature Biotech.17: 259-264; Crameri et al. (1998) Nature 391: 288-291; Crameri et al.(1997) Nature Biotech. 15: 436-438; Zhang et al. (1997) Proc. Nat'l.Acad. Sci. USA 94: 4504-4509; Patten et al. (1997) Current Opinion inBiotech. 8: 724-733; Crameri et al. (1996) Nature Med. 2: 100-103;Crameri et al. (1996) Nature Biotech. 14:315-319; Gates et al. (1996) J.Mol. Biol. 255: 373-386; Stemmer (1996) “Sexual PCR and Assembly PCR” inThe Encyclopedia of Molecular Biology (VCH Publishers, New York) pp.447-457; Crameri and Stemmer (1995) BioTechniques 18: 194-195; Stemmeret al., (1995) Gene 164: 49-53; Stemmer (1995) Science 270: 1510;Stemmer (1995) Bio/Technology 13: 549-553; Stemmer (1994) Nature 370:389-391; and Stemmer (1994) Proc. Nat'l. Acad. Sci. USA 91:10747-10751.

Mutational methods of generating diversity include, for example,site-directed mutagenesis (Ling et al. (1997) “Approaches to DNAmutagenesis: an overview” Anal Biochem. 254(2): 157-178; Dale et al.(1996) “Oligonucleotide-directed random mutagenesis using thephosphorothioate method” Methods Mol. Biol. 57:369-374; Smith (1985) “Invitro mutagenesis” Ann. Rev. Genet. 19:423-462; Botstein & Shortle(1985) “Strategies and applications of in vitro mutagenesis” Science229:1193-1201; Carter (1986) “Site-directed mutagenesis” Biochem. J.237:1-7; and Kunkel (1987) “The efficiency of oligonucleotide directedmutagenesis” in Nucleic Acids & Molecular Biology (Eckstein, F. andLilley, D. M. J. eds., Springer Verlag, Berlin)); mutagenesis usinguracil containing templates (Kunkel (1985) “Rapid and efficientsite-specific mutagenesis without phenotypic selection” Proc. Natl.Acad. Sci. USA 82:488-492; Kunkel et al. (1987) “Rapid and efficientsite-specific mutagenesis without phenotypic selection” Methods inEnzymol. 154, 367-382; and Bass et al. (1988) “Mutant Trp repressorswith new DNA-binding specificities” Science 242:240-245);oligonucleotide-directed mutagenesis (Methods in Enzymol. 100: 468-500(1983); Methods in Enzymol. 154: 329-350 (1987); Zoller & Smith (1982)“Oligonucleotide-directed mutagenesis using M13-derived vectors: anefficient and general procedure for the production of point mutations inany DNA fragment” Nucleic Acids Res. 10:6487-6500; Zoller & Smith (1983)“Oligonucleotide-directed mutagenesis of DNA fragments cloned into M13vectors” Methods in Enzymol. 100:468-500; and Zoller & Smith (1987)“Oligonucleotide-directed mutagenesis: a simple method using twooligonucleotide primers and a single-stranded DNA template” Methods inEnzymol. 154:329-350); phosphorothioate-modified DNA mutagenesis (Tayloret al. (1985) “The use of phosphorothioate-modified DNA in restrictionenzyme reactions to prepare nicked DNA” Nucl. Acids Res. 13: 8749-8764;Taylor et al. (1985) “The rapid generation of oligonucleotide-directedmutations at high frequency using phosphorothioate-modified DNA” Nucl.Acids Res. 13: 8765-8787; Nakamaye & Eckstein (1986) “Inhibition ofrestriction endonuclease Nci I cleavage by phosphorothioate groups andits application to oligonucleotide-directed mutagenesis” Nucl. AcidsRes. 14: 9679-9698; Sayers et al. (1988) “Y-T Exonucleases inphosphorothioate-based oligonucleotide-directed mutagenesis” Nucl. AcidsRes. 16:791-802; and Sayers et al. (1988) “Strand specific cleavage ofphosphorothioate-containing DNA by reaction with restrictionendonucleases in the presence of ethidium bromide” Nucl. Acids Res. 16:803-814); mutagenesis using gapped duplex DNA (Kramer et al. (1984) “Thegapped duplex DNA approach to oligonucleotide-directed mutationconstruction” Nucl. Acids Res. 12: 9441-9456; Kramer & Fritz (1987)Methods in Enzymol. “Oligonucleotide-directed construction of mutationsvia gapped duplex DNA” 154:350-367; Kramer et al. (1988) “Improvedenzymatic in vitro reactions in the gapped duplex DNA approach tooligonucleotide-directed construction of mutations” Nucl. Acids Res. 16:7207; and Fritz et al. (1988) “Oligonucleotide-directed construction ofmutations: a gapped duplex DNA procedure without enzymatic reactions invitro” Nucl. Acids Res. 16: 6987-6999).

Additional suitable methods include point mismatch repair (Kramer et al.(1984) “Point Mismatch Repair” Cell 38:879-887), mutagenesis usingrepair-deficient host strains (Carter et al. (1985) “Improvedoligonucleotide site-directed mutagenesis using M13 vectors” Nucl. AcidsRes. 13: 4431-4443; and Carter (1987) “Improved oligonucleotide-directedmutagenesis using M13 vectors” Methods in Enzymol. 154: 382-403),deletion mutagenesis (Eghtedarzadeh & Henikoff (1986) “Use ofoligonucleotides to generate large deletions” Nucl. Acids Res. 14:5115), restriction-selection and restriction-purification (Wells et al.(1986) “Importance of hydrogen-bond formation in stabilizing thetransition state of subtilisin” Phil. Trans. R. Soc. Lond. A 317:415-423), mutagenesis by total gene synthesis (Nambiar et al. (1984)“Total synthesis and cloning of a gene coding for the ribonuclease Sprotein” Science 223: 1299-1301; Sakamar and Khorana (1988) “Totalsynthesis and expression of a gene for the a-subunit of bovine rod outersegment guanine nucleotide-binding protein (transducin)” Nucl. AcidsRes. 14: 6361-6372; Wells et al. (1985) “Cassette mutagenesis: anefficient method for generation of multiple mutations at defined sites”Gene 34:315-323; and Grundström et al. (1985) “Oligonucleotide-directedmutagenesis by microscale ‘shot-gun’ gene synthesis” Nucl. Acids Res.13: 3305-3316); double-strand break repair (Mandecki (1986); Arnold(1993) “Protein engineering for unusual environments” Current Opinion inBiotechnology 4:450-455; and “Oligonucleotide-directed double-strandbreak repair in plasmids of Escherichia coli: a method for site-specificmutagenesis” Proc. Natl. Acad. Sci. USA, 83:7177-7181). Additionaldetails on many of the above methods can be found in Methods inEnzymology Volume 154, which also describes useful controls fortrouble-shooting problems with various mutagenesis methods.

Additional details regarding various diversity generating methods can befound in the following U.S. patents, PCT publications, and EPOpublications: U.S. Pat. No. 5,605,793 to Stemmer (Feb. 25, 1997),“Methods for In vitro Recombination;” U.S. Pat. No. 5,811,238 to Stemmeret al. (Sep. 22, 1998) “Methods for Generating Polynucleotides havingDesired Characteristics by Iterative Selection and Recombination;” U.S.Pat. No. 5,830,721 to Stemmer et al. (Nov. 3, 1998), “DNA Mutagenesis byRandom Fragmentation and Reassembly;” U.S. Pat. No. 5,834,252 toStemmer, et al. (Nov. 10, 1998) “End-Complementary Polymerase Reaction;”U.S. Pat. No. 5,837,458 to Minshull, et al. (Nov. 17, 1998), “Methodsand Compositions for Cellular and Metabolic Engineering;” WO 95/22625,Stemmer and Crameri, “Mutagenesis by Random Fragmentation andReassembly;” WO 96/33207 by Stemmer and Lipschutz “End ComplementaryPolymerase Chain Reaction;” WO 97/20078 by Stemmer and Crameri “Methodsfor Generating Polynucleotides having Desired Characteristics byIterative Selection and Recombination;” WO 97/35966 by Minshull andStemmer, “Methods and Compositions for Cellular and MetabolicEngineering;” WO 99/41402 by Punnonen et al. “Targeting of GeneticVaccine Vectors;” WO 99/41383 by Punnonen et al. “Antigen LibraryImmunization;” WO 99/41369 by Punnonen et al. “Genetic Vaccine VectorEngineering;” WO 99/41368 by Punnonen et al. “Optimization ofImmunomodulatory Properties of Genetic Vaccines;” EP 752008 by Stemmerand Crameri, “DNA Mutagenesis by Random Fragmentation and Reassembly;”EP 0932670 by Stemmer “Evolving Cellular DNA Uptake by RecursiveSequence Recombination;” WO 99/23107 by Stemmer et al., “Modification ofVirus Tropism and Host Range by Viral Genome Shuffling;” WO 99/21979 byApt et al., “Human Papillomavirus Vectors;” WO 98/31837 by del Cardayreet al. “Evolution of Whole Cells and Organisms by Recursive SequenceRecombination;” WO 98/27230 by Patten and Stemmer, “Methods andCompositions for Polypeptide Engineering;” WO 98/13487 by Stemmer etal., “Methods for Optimization of Gene Therapy by Recursive SequenceShuffling and Selection;” WO 00/00632, “Methods for Generating HighlyDiverse Libraries;” WO 00/09679, “Methods for Obtaining in vitroRecombined Polynucleotide Sequence Banks and Resulting Sequences;” WO98/42832 by Arnold et al., “Recombination of Polynucleotide SequencesUsing Random or Defined Primers;” WO 99/29902 by Arnold et al., “Methodfor Creating Polynucleotide and Polypeptide Sequences;” WO 98/41653 byVind, “An in vitro Method for Construction of a DNA Library;” WO98/41622 by Borchert et al., “Method for Constructing a Library UsingDNA Shuffling;” WO 98/42727 by Pati and Zarling, “Sequence Alterationsusing Homologous Recombination;” WO 00/18906 by Patten et al.,“Shuffling of Codon-Altered Genes;” WO 00/04190 by del Cardayre et al.“Evolution of Whole Cells and Organisms by Recursive Recombination;” WO00/42561 by Crameri et al., “Oligonucleotide Mediated Nucleic AcidRecombination;” WO 00/42559 by Selifonov and Stemmer “Methods ofPopulating Data Structures for Use in Evolutionary Simulations;” WO00/42560 by Selifonov et al., “Methods for Making Character Strings,Polynucleotides & Polypeptides Having Desired Characteristics;” WO01/23401 by Welch et al., “Use of Codon-Varied Oligonucleotide Synthesisfor Synthetic Shuffling;” and WO 01/64864 “Single-Stranded Nucleic AcidTemplate-Mediated Recombination and Nucleic Acid Fragment Isolation” byAffholter.

Certain U.S. applications provide additional details regarding variousdiversity generating methods, including “SHUFFLING OF CODON ALTEREDGENES” by Patten et al. filed Sep. 28, 1999, (U.S. Ser. No. 09/407,800);“EVOLUTION OF WHOLE CELLS AND ORGANISMS BY RECURSIVE SEQUENCERECOMBINATION”, by del Cardayre et al. filed Jul. 15, 1998 (U.S. Ser.No. 09/166,188), and Jul. 15, 1999 (U.S. Pat. No. 6,379,964);“OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION” by Crameri et al.,filed Sep. 28, 1999 (U.S. Pat. No. 6,376,246); “OLIGONUCLEOTIDE MEDIATEDNUCLEIC ACID RECOMBINATION” by Crameri et al., filed Jan. 18, 2000 (WO00/42561); “USE OF CODON-BASED OLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETICSHUFFLING” by Welch et al., filed Sep. 28, 1999 (U.S. Pat. No.6,436,675); “METHODS FOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES &POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” by Selifonov et al., filedJan. 18, 2000, (WO 00/42560); “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., filed Jul. 18, 2000 (U.S. Ser. No. 09/618,579);“METHODS OF POPULATING DATA STRUCTURES FOR USE IN EVOLUTIONARYSIMULATIONS” by Selifonov and Stemmer (WO 00/42559), filed Jan. 18,2000; and “SINGLE-STRANDED NUCLEIC ACID TEMPLATE-MEDIATED RECOMBINATIONAND NUCLEIC ACID FRAGMENT ISOLATION” by Affholter (U.S. Ser. No.60/186,482, filed Mar. 2, 2000).

In brief, several different general classes of sequence modificationmethods, such as mutation, recombination, etc. are applicable to thepresent invention and set forth in the references above. That is,alterations to the component nucleic acid sequences to produced modifiedgene fusion constructs can be performed by any number of the protocolsdescribed, either before cojoining of the sequences, or after thecojoining step. The following exemplify some of the different types ofpreferred formats for diversity generation in the context of the presentinvention, including, e.g., certain recombination based diversitygeneration formats.

Nucleic acids can be recombined in vitro by any of a variety oftechniques discussed in the references above, including e.g., DNAsedigestion of nucleic acids to be recombined followed by ligation and/orPCR reassembly of the nucleic acids. For example, sexual PCR mutagenesiscan be used in which random (or pseudo random, or even non-random)fragmentation of the DNA molecule is followed by recombination, based onsequence similarity, between DNA molecules with different but relatedDNA sequences, in vitro, followed by fixation of the crossover byextension in a polymerase chain reaction. This process and many processvariants is described in several of the references above, e.g., inStemmer (1994) Proc. Nat'l. Acad. Sci. USA 91:10747-10751.

Similarly, nucleic acids can be recursively recombined in vivo, e.g., byallowing recombination to occur between nucleic acids in cells. Manysuch in vivo recombination formats are set forth in the references notedabove. Such formats optionally provide direct recombination betweennucleic acids of interest, or provide recombination between vectors,viruses, plasmids, etc., comprising the nucleic acids of interest, aswell as other formats. Details regarding such procedures are found inthe references noted above.

Whole genome recombination methods can also be used in which wholegenomes of cells or other organisms are recombined, optionally includingspiking of the genomic recombination mixtures with desired librarycomponents (e.g., genes corresponding to the pathways of the presentinvention). These methods have many applications, including those inwhich the identity of a target gene is not known. Details on suchmethods are found, e.g., in WO 98/31837 by del Cardayre et al.“Evolution of Whole Cells and Organisms by Recursive SequenceRecombination;” and in, e.g., WO 00/04190 by del Cardayre et al., alsoentitled “Evolution of Whole Cells and Organisms by Recursive SequenceRecombination.” Thus, any of these processes and techniques forrecombination, recursive recombination, and whole genome recombination,alone or in combination, can be used to generate the modified nucleicacid sequences and/or modified gene fusion constructs of the presentinvention.

Synthetic recombination methods can also be used, in whicholigonucleotides corresponding to targets of interest are synthesizedand reassembled in PCR or ligation reactions which includeoligonucleotides which correspond to more than one parental nucleicacid, thereby generating new recombined nucleic acids. Oligonucleotidescan be made by standard nucleotide addition methods, or can be made,e.g., by tri-nucleotide synthetic approaches. Details regarding suchapproaches are found in the references noted above, including, e.g., WO00/42561 by Crameri et al., “Oligonucleotide Mediated Nucleic AcidRecombination;” WO 01/23401 by Welch et al., “Use of Codon-VariedOligonucleotide Synthesis for Synthetic Shuffling;” WO 00/42560 bySelifonov et al., “Methods for Making Character Strings, Polynucleotidesand Polypeptides Having Desired Characteristics;” and WO 00/42559 bySelifonov and Stemmer “Methods of Populating Data Structures for Use inEvolutionary Simulations.”

In silico methods of recombination can be effected in which geneticalgorithms are used in a computer to recombine sequence strings whichcorrespond to homologous (or even non-homologous) nucleic acids. Theresulting recombined sequence strings are optionally converted intonucleic acids by synthesis of nucleic acids which correspond to therecombined sequences, e.g., in concert with oligonucleotide synthesisgene reassembly techniques. This approach can generate random, partiallyrandom or designed variants. Many details regarding in silicorecombination, including the use of genetic algorithms, geneticoperators and the like in computer systems, combined with generation ofcorresponding nucleic acids (and/or proteins), as well as combinationsof designed nucleic acids and/or proteins (e.g., based on cross-oversite selection) as well as designed, pseudo-random or randomrecombination methods are described in WO 00/42560 by Selifonov et al.,“Methods for Making Character Strings, Polynucleotides and PolypeptidesHaving Desired Characteristics” and WO 00/42559 by Selifonov and Stemmer“Methods of Populating Data Structures for Use in EvolutionarySimulations.” Extensive details regarding in silico recombinationmethods are found in these applications. This methodology is generallyapplicable to the present invention in providing for recombination ofnucleic acid sequences and/or gene fusion constructs encoding proteinsinvolved in various metabolic pathways (such as, for example, carotenoidbiosynthetic pathways, ectoine biosynthetic pathways,polyhydroxyalkanoate biosynthetic pathways, aromatic polyketidebiosynthetic pathways, and the like) in silico and/or the generation ofcorresponding nucleic acids or proteins.

Many methods of accessing natural diversity, e.g., by hybridization ofdiverse nucleic acids or nucleic acid fragments to single-strandedtemplates, followed by polymerization and/or ligation to regeneratefull-length sequences, optionally followed by degradation of thetemplates and recovery of the resulting modified nucleic acids can besimilarly used. In one method employing a single-stranded template, thefragment population derived from the genomic library(ies) is annealedwith partial, or, often approximately full length ssDNA or RNAcorresponding to the opposite strand. Assembly of complex chimeric genesfrom this population is then mediated by nuclease-base removal ofnon-hybridizing fragment ends, polymerization to fill gaps between suchfragments and subsequent single stranded ligation. The parentalpolynucleotide strand can be removed by digestion (e.g., if RNA oruracil-containing), magnetic separation under denaturing conditions (iflabeled in a manner conducive to such separation) and other availableseparation/purification methods. Alternatively, the parental strand isoptionally co-purified with the chimeric strands and removed duringsubsequent screening and processing steps. Additional details regardingthis approach are found, e.g., in “Single-Stranded Nucleic AcidTemplate-Mediated Recombination and Nucleic Acid Fragment Isolation” byAffholter, WO 01/64864.

In another approach, single-stranded molecules are converted todouble-stranded DNA (dsDNA) and the dsDNA molecules are bound to a solidsupport by ligand-mediated binding. After separation of unbound DNA, theselected DNA molecules are released from the support and introduced intoa suitable host cell to generate a library of enriched sequences whichhybridize to the probe. A library produced in this manner provides adesirable substrate for further diversification using any of theprocedures described herein.

Any of the preceding general recombination formats can be practiced in areiterative fashion (e.g., one or more cycles of mutation/recombinationor other diversity generation methods, optionally followed by one ormore selection methods) to generate a more diverse set of recombinantnucleic acids.

Mutagenesis employing polynucleotide chain termination methods have alsobeen proposed (see e.g., U.S. Pat. No. 5,965,408, “Method of DNAreassembly by interrupting synthesis” to Short, and the referencesabove), and can be applied to the present invention. In this approach,double stranded DNAs corresponding to one or more genes sharing regionsof sequence similarity are combined and denatured, in the presence orabsence of primers specific for the gene. The single strandedpolynucleotides are then annealed and incubated in the presence of apolymerase and a chain terminating reagent (e.g., ultraviolet, gamma orX-ray irradiation; ethidium bromide or other intercalators; DNA bindingproteins, such as single strand binding proteins, transcriptionactivating factors, or histones; polycyclic aromatic hydrocarbons;trivalent chromium or a trivalent chromium salt; or abbreviatedpolymerization mediated by rapid thermocycling; and the like), resultingin the production of partial duplex molecules. The partial duplexmolecules, e.g., containing partially extended chains, are thendenatured and reannealed in subsequent rounds of replication or partialreplication resulting in polynucleotides which share varying degrees ofsequence similarity and which are diversified with respect to thestarting population of DNA molecules. Optionally, the products, orpartial pools of the products, can be amplified at one or more stages inthe process. Polynucleotides produced by a chain termination method,such as described above, are suitable substrates for any other describedrecombination format.

Diversity also can be generated in nucleic acids or populations ofnucleic acids using a recombinational procedure termed “incrementaltruncation for the creation of hybrid enzymes” (“ITCHY”) described inOstermeier et al. (1999) “A combinatorial approach to hybrid enzymesindependent of DNA homology” Nature Biotech 17: 1205. This approach canbe used to generate an initial library of variants which can optionallyserve as a substrate for one or more in vitro or in vivo recombinationmethods. See, also, Ostermeier et al. (1999) “Combinatorial ProteinEngineering by Incremental Truncation,” Proc. Natl. Acad. Sci. USA, 96:3562-67; and Ostermeier et al. (1999), “Incremental Truncation as aStrategy in the Engineering of Novel Biocatalysts,” Biological andMedicinal Chemistry, 7: 2139-44.

Mutational methods which result in the alteration of individualnucleotides or groups of contiguous or non-contiguous nucleotides can befavorably employed to introduce nucleotide diversity into the nucleicacid sequences and/or gene fusion constructs of the present invention.Many mutagenesis methods are found in the above-cited references;additional details regarding mutagenesis methods can be found infollowing, which can also be applied to the present invention.

For example, error-prone PCR can be used to generate nucleic acidvariants. Using this technique, PCR is performed under conditions wherethe copying fidelity of the DNA polymerase is low, such that a high rateof point mutations is obtained along the entire length of the PCRproduct. Examples of such techniques are found in the references aboveand, e.g., in Leung et al. (1989) Technique 1: 11-15 and Caldwell et al.(1992) PCR Methods Applic. 2: 28-33. Similarly, assembly PCR can beused, in a process which involves the assembly of a PCR product from amixture of small DNA fragments. A large number of different PCRreactions can occur in parallel in the same reaction mixture, with theproducts of one reaction priming the products of another reaction.

Oligonucleotide directed mutagenesis can be used to introducesite-specific mutations in a nucleic acid sequence of interest. Examplesof such techniques are found in the references above and, e.g., inReidhaar-Olson et al. (1988) Science 241:53-57. Similarly, cassettemutagenesis can be used in a process that replaces a small region of adouble stranded DNA molecule with a synthetic oligonucleotide cassettethat differs from the native sequence. The oligonucleotide can contain,e.g., completely and/or partially randomized native sequence(s).

Recursive ensemble mutagenesis is a process in which an algorithm forprotein mutagenesis is used to produce diverse populations ofphenotypically related mutants, members of which differ in amino acidsequence. This method uses a feedback mechanism to monitor successiverounds of combinatorial cassette mutagenesis. Examples of this approachare found in Arkin & Youvan (1992) Proc. Nat'l. Acad. Sci. USA89:7811-7815.

Exponential ensemble mutagenesis can be used for generatingcombinatorial libraries with a high percentage of unique and functionalmutants. Small groups of residues in a sequence of interest arerandomized in parallel to identify, at each altered position, aminoacids which lead to functional proteins. Examples of such procedures arefound in Delegrave & Youvan (1993) Biotech. Res. 11:1548-1552.

In vivo mutagenesis can be used to generate random mutations in anycloned DNA of interest by propagating the DNA, e.g., in a strain of E.coli that carries mutations in one or more of the DNA repair pathways.These “mutator” strains have a higher random mutation rate than that ofa wild-type parent. Propagating the DNA in one of these strains willeventually generate random mutations within the DNA. Such procedures aredescribed in the references noted above.

Other procedures for introducing diversity into a genome, e.g. abacterial, fungal, animal or plant genome can be used in conjunctionwith the above described and/or referenced methods. For example, inaddition to the methods above, techniques have been proposed whichproduce nucleic acid multimers suitable for transformation into avariety of species (see, e.g., Schellenberger U.S. Pat. No. 5,756,316and the references above). Transformation of a suitable host with suchmultimers, consisting of genes that are divergent with respect to oneanother, (e.g., derived from natural diversity or through application ofsite directed mutagenesis, error prone PCR, passage through mutagenicbacterial strains, and the like), provides a source of nucleic aciddiversity for DNA diversification, e.g., by an in vivo recombinationprocess as indicated above.

Alternatively, a multiplicity of monomeric polynucleotides sharingregions of partial sequence similarity can be transformed into a hostspecies and recombined in vivo by the host cell. Subsequent rounds ofcell division can be used to generate libraries, members of which,include a single, homogenous population, or pool of monomericpolynucleotides. Alternatively, the monomeric nucleic acids can berecovered by standard techniques, e.g., PCR and/or cloning, andrecombined in any of the recombination formats, including recursiverecombination formats, described above.

Methods for generating multispecies expression libraries have beendescribed (in addition to the references noted above, see, e.g.,Peterson et al. (1998) U.S. Pat. No. 5,783,431 “METHODS FOR GENERATINGAND SCREENING NOVEL METABOLIC PATHWAYS;” and Thompson, et al. (1998)U.S. Pat. No. 5,824,485 METHODS FOR GENERATING AND SCREENING NOVELMETABOLIC PATHWAYS) and their use to identify protein activities ofinterest has been proposed (in addition to the references noted above,see, Short (1999) U.S. Pat. No. 5,958,672 “PROTEIN ACTIVITY SCREENING OFCLONES HAVING DNA FROM UNCULTIVATED MICROORGANISMS”). Multispeciesexpression libraries include, in general, libraries comprising cDNA orgenomic sequences from a plurality of species or strains, operablylinked to appropriate regulatory sequences, in an expression cassette.The cDNA and/or genomic sequences are optionally randomly ligated tofurther enhance diversity. The vector can be a shuttle vector suitablefor transformation and expression in more than one species of hostorganism, e.g., bacterial species or eukaryotic cells. In some cases,the library is biased by preselecting sequences which encode a proteinof interest, or which hybridize to a nucleic acid of interest. Any suchlibraries can be provided as substrates for any of the methods hereindescribed.

The above described procedures have been largely directed to increasingnucleic acid and/or encoded protein diversity. However, in many cases,not all of the diversity is useful, e.g., functional, and contributesmerely to increasing the background of variants that must be screened orselected to identify the few favorable variants. In some applications,it is desirable to preselect or prescreen libraries (e.g., an amplifiedlibrary, a genomic library, a cDNA library, a normalized library, etc.)or other substrate nucleic acids prior to diversification, e.g., byrecombination-based mutagenesis procedures, or to otherwise bias thesubstrates towards nucleic acids that encode functional products. Forexample, in the case of antibody engineering, it is possible to bias thediversity generating process toward antibodies with functional antigenbinding sites by taking advantage of in vivo recombination events priorto manipulation by any of the described methods. For example, recombinedCDRs derived from B cell cDNA libraries can be amplified and assembledinto framework regions (e.g., Jirholt et al. (1998) “Exploiting sequencespace: shuffling in vivo formed complementarity determining regions intoa master framework” Gene 215: 471) prior to diversifying according toany of the methods described herein.

Libraries can be biased towards nucleic acids which encode proteins withdesirable enzyme activities. For example, after identifying a clone froma library which exhibits a specified activity, the clone can bemutagenized using any known method for introducing DNA alterations. Alibrary comprising the mutagenized homologues is then screened for adesired activity, which can be the same as or different from theinitially specified activity. An example of such a procedure is proposedin Short (1999) U.S. Pat. No. 5,939,250 for “PRODUCTION OF ENZYMESHAVING DESIRED ACTIVITIES BY MUTAGENESIS.” Desired activities can beidentified by any method known in the art. For example, WO 99/10539proposes that gene libraries can be screened by combining extracts fromthe gene library with components obtained from metabolically rich cellsand identifying combinations which exhibit the desired activity. It hasalso been proposed (e.g., WO 98/58085) that clones with desiredactivities can be identified by inserting bioactive substrates intosamples of the library, and detecting bioactive fluorescencecorresponding to the product of a desired activity using a fluorescentanalyzer, e.g., a flow cytometry device, a CCD, a fluorometer, or aspectrophotometer.

Libraries can also be biased towards nucleic acids which have specifiedcharacteristics, e.g., hybridization to a selected nucleic acid probe.For example, WO 99/10539 proposes that polynucleotides encoding adesired activity (e.g., an enzymatic activity, for example: a lipase, anesterase, a protease, a glycosidase, a glycosyl transferase, aphosphatase, a kinase, an oxygenase, a peroxidase, a hydrolase, ahydratase, a nitrilase, a transaminase, an amidase or an acylase) can beidentified from among genomic DNA sequences. In particular, singlestranded DNA molecules from a population of genomic DNA are hybridizedto a ligand-conjugated probe. The genomic DNA can be derived from eithera cultivated or uncultivated microorganism, or from an environmentalsample. Alternatively, the genomic DNA can be derived from amulticellular organism, or a tissue derived therefrom. Second strandsynthesis can be conducted directly from the hybridization probe used inthe capture, with or without prior release from the capture medium or bya wide variety of other strategies known in the art. Alternatively, theisolated single-stranded genomic DNA population can be fragmentedwithout further cloning and used directly in, e.g., arecombination-based approach, that employs a single-stranded template,as described above.

“Non-stochastic” methods of generating nucleic acids and polypeptidesare described in Short “Non-Stochastic Generation of Genetic Vaccinesand Enzymes” WO 00/46344. These methods, including proposednon-stochastic polynucleotide reassembly and site-saturation mutagenesismethods can be applied to the present invention as well. Random orsemi-random mutagenesis using doped or degenerate oligonucleotides isalso described in, e.g., Arkin and Youvan (1992) “Optimizing nucleotidemixtures to encode specific subsets of amino acids for semi-randommutagenesis” Biotechnology 10:297-300; Reidhaar-Olson et al. (1991)“Random mutagenesis of protein sequences using oligonucleotidecassettes” Methods Enzymol. 208:564-86; Lim and Sauer (1991) “The roleof internal packing interactions in determining the structure andstability of a protein” J. Mol. Biol. 219:359-76; Breyer and Sauer(1989) “Mutational analysis of the fine specificity of binding ofmonoclonal antibody 51F to lambda repressor” J. Biol. Chem.264:13355-60); “Walk-Through Mutagenesis” (Crea, R; U.S. Pat. Nos.5,830,650 and 5,798,208, and EP Patent 0527809 B1.

It will be readily appreciated that any of the above describedtechniques suitable for enriching a library prior to diversification canalso be used to screen the products, or libraries of products, producedby the diversity generating methods. Any of the above described methodscan be practiced recursively or in combination to alter nucleic acids,e.g., GAT encoding polynucleotides.

Kits for mutagenesis, library construction and other diversitygeneration methods are also commercially available. For example, kitsare available from, e.g., Stratagene (e.g., QuickChange™ site-directedmutagenesis kit; and Chameleon™ double-stranded, site-directedmutagenesis kit); Bio/Can Scientific, Bio-Rad (e.g., using the Kunkelmethod described above); Boehringer Mannheim Corp.; ClonetechLaboratories; DNA Technologies; Epicentre Technologies (e.g., 5 prime 3prime kit); Genpak Inc.; Lemargo Inc.; Life Technologies (Gibco BRL);New England Biolabs; Pharmacia Biotech; Promega Corp.; QuantumBiotechnologies; Amersham International plc (e.g., using the Ecksteinmethod above); and Anglian Biotechnology Ltd (e.g., using theCarter/Winter method above).

The above references provide many mutational formats, includingrecombination, recursive recombination, recursive mutation andcombinations of recombination with other forms of mutagenesis, as wellas many modifications of these formats. Regardless of the diversitygeneration format that is used, the nucleic acids of the presentinvention can be recombined (with each other, or with related (or evenunrelated) sequences) to produce a diverse set of recombinant nucleicacids for use in the gene fusion constructs and modified gene fusionconstructs of the present invention, including, e.g., sets of homologousnucleic acids, as well as corresponding polypeptides.

Many of the above-described methodologies for generating modifiedpolynucleotides generate a large number of diverse variants of aparental sequence or sequences. In some preferred embodiments of theinvention the modification technique (e.g., some form of shuffling) isused to generate a library of variants that is then screened for amodified polynucleotide or pool of modified polynucleotides encodingsome desired functional attribute, e.g., improved GAT activity.Exemplary enzymatic activities that can be screened for includecatalytic rates (conventionally characterized in terms of kineticconstants such as k_(cat) and K_(M)), substrate specificity, andsusceptibility to activation or inhibition by substrate, product orother molecules (e.g., inhibitors or activators).

One example of selection for a desired enzymatic activity entailsgrowing host cells under conditions that inhibit the growth and/orsurvival of cells that do not sufficiently express an enzymatic activityof interest, e.g. the GAT activity. Using such a selection process caneliminate from consideration all modified polynucleotides except thoseencoding a desired enzymatic activity. For example, in some embodimentsof the invention host cells are maintained under conditions that inhibitcell growth or survival in the absence of sufficient levels of GAT,e.g., a concentration of glyphosate that is lethal or inhibits thegrowth of a wild-type plant of the same variety that either lacks ordoes not express a GAT polynucleotide. Under these conditions, only ahost cell harboring a modified nucleic acid that encodes enzymaticactivity or activities able to catalyze production of sufficient levelsof the product will survive and grow. Some embodiments of the inventionemploy multiple rounds of screening at increasing concentrations ofglyphosate or a glyphosate analog.

In some embodiments of the invention, mass spectrometry is used todetect the acetylation of glyphosate, or a glyphosate analog ormetabolite. The use of mass spectrometry is described in more detail inthe Examples below.

For convenience and high throughput it will often be desirable toscreen/select for desired modified nucleic acids in a microorganism,e.g., a bacteria such as E. coli. On the other hand, screening in plantcells or plants can in some cases be preferable where the ultimate aimis to generate a modified nucleic acid for expression in a plant system.

In some preferred embodiments of the invention throughput is increasedby screening pools of host cells expressing different modified nucleicacids, either alone or as part of a gene fusion construct. Any poolsshowing significant activity can be deconvoluted to identify singleclones expressing the desirable activity.

The skilled artisan will recognize that the relevant assay, screening orselection method will vary depending upon the desired host organism andother parameters known in the art. It is normally advantageous to employan assay that can be practiced in a high-throughput format.

In high-throughput assays, it is possible to screen up to severalthousand different variants in a single day. For example, each well of amicrotiter plate can be used to run a separate assay, or, ifconcentration or incubation time effects are to be observed, every 5-10wells can test a single variant.

In addition to fluidic approaches, it is possible, as mentioned above,simply to grow cells on media plates that select for the desiredenzymatic or metabolic function. This approach offers a simple andhigh-throughput screening method.

A number of well known robotic systems have also been developed forsolution phase chemistries useful in assay systems. These systemsinclude automated workstations like the automated synthesis apparatusdeveloped by Takeda Chemical Industries, LTD. (Osaka, Japan) and manyrobotic systems utilizing robotic arms (Zymate II, Zymark Corporation,Hopkinton, Mass.; and Orca, Hewlett-Packard, Palo Alto, Calif.) whichmimic the manual synthetic operations performed by a scientist. Any ofthe above devices are suitable for application to the present invention.The nature and implementation of modifications to these devices (if any)so that they can operate as discussed herein with reference to theintegrated system will be apparent to persons skilled in the relevantart.

High-throughput screening systems are commercially available (see, e.g.,Zymark Corp., Hopkinton, Mass.; Air Technical Industries, Mentor, Ohio;Beckman Instruments, Inc. Fullerton, Calif.; Precision Systems, Inc.,Natick, Mass., etc.). These systems typically automate entire proceduresincluding all sample and reagent pipetting, liquid dispensing, timedincubations, and final readings of the microplate in detector(s)appropriate for the particular assay. These configurable systems providehigh throughput and rapid start up as well as a high degree offlexibility and customization.

The manufacturers of such systems provide detailed protocols for thevarious high throughput devices. Thus, for example, Zymark Corp.provides technical bulletins describing screening systems for detectingthe modulation of gene transcription, ligand binding, and the like.Microfluidic approaches to reagent manipulation have also beendeveloped, e.g., by Caliper Technologies (Mountain View, Calif.).

Optical images viewed (and, optionally, recorded) by a camera or otherrecording device (e.g., a photodiode and data storage device) areoptionally further processed in any of the embodiments herein, e.g., bydigitizing the image and/or storing and analyzing the image on acomputer. A variety of commercially available peripheral equipment andsoftware is available for digitizing, storing and analyzing a digitizedvideo or digitized optical image, e.g., using PC (Intel x86 or Pentiumchip compatible DOS™, OS™ WINDOWS™, WINDOWS NT™ or WINDOWS 95™ basedmachines), MACINTOSH™, or UNIX based (e.g., SUN™ work station)computers.

One conventional system carries light from the assay device to a cooledcharge-coupled device (CCD) camera, a common use in the art. A CCDcamera includes an array of picture elements (pixels). The light fromthe specimen is imaged on the CCD. Particular pixels corresponding toregions of the specimen (e.g., individual hybridization sites on anarray of biological polymers) are sampled to obtain light intensityreadings for each position. Multiple pixels are processed in parallel toincrease speed. The apparatus and methods of the invention are easilyused for viewing any sample, e.g. by fluorescent or dark fieldmicroscopic techniques.

Other Polynucleotide Compositions

The invention also includes compositions comprising two or morepolynucleotides of the invention (e.g., as substrates forrecombination). The composition can comprise a library of recombinantnucleic acids, where the library contains at least 2, 3, 5, 10, 20, or50 or more polynucleotides. The polynucleotides are optionally clonedinto expression vectors, providing expression libraries.

The invention also includes compositions produced by digesting one ormore polynucleotide of the invention with a restriction endonuclease, anRNAse, or a DNAse (e.g., as is performed in certain of the recombinationformats noted above); and compositions produced by fragmenting orshearing one or more polynucleotide of the invention by mechanical means(e.g., sonication, vortexing, and the like), which can also be used toprovide substrates for recombination in the methods above. Similarly,compositions comprising sets of oligonucleotides corresponding to morethan one nucleic acid of the invention are useful as recombinationsubstrates and are a feature of the invention. For convenience, thesefragmented, sheared, or oligonucleotide synthesized mixtures arereferred to as fragmented nucleic acid sets.

Also included in the invention are compositions produced by incubatingone or more of the fragmented nucleic acid sets in the presence ofribonucleotide- or deoxyribonucleotide triphosphates and a nucleic acidpolymerase. This resulting composition forms a recombination mixture formany of the recombination formats noted above. The nucleic acidpolymerase may be an RNA polymerase, a DNA polymerase, or anRNA-directed DNA polymerase (e.g., a “reverse transcriptase”); thepolymerase can be, e.g., a thermostable DNA polymerase (such as, VENT,TAQ, or the like).

Integrated Systems

The present invention provides computers, computer readable media andintegrated systems comprising character strings corresponding to thesequence information herein for the polypeptides and nucleic acidsherein, including, e.g., those sequences listed herein and the varioussilent substitutions and conservative substitutions thereof.

For example, various methods and genetic algorithms (GAs) known in theart can be used to detect homology or similarity between differentcharacter strings, or can be used to perform other desirable functionssuch as to control output files, provide the basis for makingpresentations of information including the sequences and the like.Examples include BLAST, discussed supra.

Thus, different types of homology and similarity of various stringencyand length can be detected and recognized in the integrated systemsdescribed herein. For example, many homology determination methods havebeen designed for comparative analysis of sequences of biopolymers, forspell-checking in word processing, and for data retrieval from variousdatabases. With an understanding of double-helix pair-wise complementinteractions among 4 principal nucleobases in natural polynucleotides,models that simulate annealing of complementary homologouspolynucleotide strings can also be used as a foundation of sequencealignment or other operations typically performed on the characterstrings corresponding to the sequences herein (e.g., word-processingmanipulations, construction of figures comprising sequence orsubsequence character strings, output tables, etc.). An example of asoftware package with GAs for calculating sequence similarity is BLAST,which can be adapted to the present invention by inputting characterstrings corresponding to the sequences herein.

Similarly, standard desktop applications such as word processingsoftware (e.g., Microsoft Word™ or Corel WordPerfect™) and databasesoftware (e.g., spreadsheet software such as Microsoft Excel™, CorelQuattro Pr™, or database programs such as Microsoft Access™ or Paradox™)can be adapted to the present invention by inputting a character stringcorresponding to the GAT homologues of the invention (either nucleicacids or proteins, or both). For example, the integrated systems caninclude the foregoing software having the appropriate character stringinformation, e.g., used in conjunction with a user interface (e.g., aGUI in a standard operating system such as a Windows, Macintosh or LINUXsystem) to manipulate strings of characters. As noted, specializedalignment programs such as BLAST can also be incorporated into thesystems of the invention for alignment of nucleic acids or proteins (orcorresponding character strings).

Integrated systems for analysis in the present invention typicallyinclude a digital computer with GA software for aligning sequences, aswell as data sets entered into the software system comprising any of thesequences herein. The computer can be, e.g., a PC (Intel x86 or Pentiumchip compatible DOS™, OS2™ WINDOWS™ WINDOWS NT™, WINDOWS95™, WINDOWS98™LINUX based machine, a MACINTOSH™, Power PC, or a UNIX based (e.g., SUN™work station) machine) or other commercially common computer which isknown to one of skill. Software for aligning or otherwise manipulatingsequences is available, or can easily be constructed by one of skillusing a standard programming language such as Visualbasic, Fortran,Basic, Java, or the like.

Any controller or computer optionally includes a monitor which is oftena cathode ray tube (“CRT”) display, a flat panel display (e.g., activematrix liquid crystal display, liquid crystal display), or others.Computer circuitry is often placed in a box which includes numerousintegrated circuit chips, such as a microprocessor, memory, interfacecircuits, and others. The box also optionally includes a hard diskdrive, a floppy disk drive, a high capacity removable drive such as awriteable CD-ROM, and other common peripheral elements. Inputtingdevices such as a keyboard or mouse optionally provide for input from auser and for user selection of sequences to be compared or otherwisemanipulated in the relevant computer system.

The computer typically includes appropriate software for receiving userinstructions, either in the form of user input into set parameterfields, e.g., in a GUI, or in the form of preprogrammed instructions,e.g., preprogrammed for a variety of different specific operations. Thesoftware then converts these instructions to appropriate language forinstructing the operation of the fluid direction and transportcontroller to carry out the desired operation.

The software can also include output elements for controlling nucleicacid synthesis (e.g., based upon a sequence or an alignment of asequences herein) or other operations which occur downstream from analignment or other operation performed using a character stringcorresponding to a sequence herein. Nucleic acid synthesis equipmentcan, accordingly, be a component in one or more integrated systemsherein.

In an additional aspect, the present invention provides kits embodyingthe methods, composition, systems and apparatus herein. Kits of theinvention optionally comprise one or more of the following: (1) anapparatus, system, system component or apparatus component as describedherein; (2) instructions for practicing the methods described herein,and/or for operating the apparatus or apparatus components herein and/orfor using the compositions herein; (3) one or more GAT compositions orcomponents; (4) a container for holding components or compositions, and,(5) packaging materials.

In a further aspect, the present invention provides for the use of anyapparatus, apparatus component, composition or kit herein, for thepractice of any method or assay herein, and/or for the use of anyapparatus or kit to practice any assay or method herein.

Host Cells and Organisms

The host cell can be eukaryotic, for example, a eukaryotic cell, a plantcell, an animal cell, a protoplast, or a tissue culture cell. The hostcell optionally comprises a plurality of cells, for example, anorganism. Alternatively, the host cell can be prokaryotic including, butnot limited to, bacteria (i.e., gram positive bacteria, purple bacteria,green sulfur bacteria, green non-sulfur bacteria, cyanobacteria,spirochetes, thermatogales, flavobacteria, and bacteroides) andarchaebacteria (i.e., Korarchaeota, Thermoproteus, Pyrodictium,Thermococcales, Methanogens, Archaeoglobus, and extreme Halophiles).

Transgenic plants, or plant cells, incorporating the GAT nucleic acids,and/or expressing the GAT polypeptides of the invention are a feature ofthe invention. The transformation of plant cells and protoplasts can becarried out in essentially any of the various ways known to thoseskilled in the art of plant molecular biology, including, but notlimited to, the methods described herein. See, in general, Methods inEnzymology, Vol. 153 (Recombinant DNA Part D) Wu and Grossman (eds.)1987, Academic Press; and Weising et al., Ann. Rev. Genet. 22: 421-477(1988), incorporated herein by reference. For example, the DNA constructmay be introduced directly into the genomic DNA of the plant cell usingtechniques such as electroporation, PEG-mediated transfection, particlebombardment, silicon fiber delivery, or microinjection of plant cellprotoplasts or embryogenic callus. See, e.g., Tomes, et al. (1995)“Direct DNA Transfer into Intact Plant Cells Via MicroprojectileBombardment,” in Plant Cell, Tissue and Organ Culture, FundamentalMethods, eds. Gamborg and Phillips (Springer-Verlag, Berlin), pp.197-213. Further methods for transforming various host cells aredisclosed in Klein et al. (1992) “Transformation of microbes, plants andanimals by particle bombardment” Bio/Technol. 10 (3): 286-291.

The introduction of DNA constructs using polyethylene glycolprecipitation is described in Paszkowski et al. (1984) EMBO J.3:2717-2722. Electroporation techniques are described in Fromm et al.(1985) Proc. Natl. Acad. Sci. 82:5824. Ballistic transformationtechniques are described in Klein et al. (1987) Nature 327: 70-73.

Alternatively, the DNA constructs may be combined with suitable T-DNAflanking regions and introduced into a conventional Agrobacteriumtumefaciens host vector. The virulence functions of the Agrobacteriumtumefaciens host will direct the insertion of the construct and adjacentmarker into the plant cell DNA when the cell is infected by thebacteria. See, U.S. Pat. No. 5,591,616.

Agrobacterium tumefaciens-meditated transformation techniques are welldescribed in the scientific literature. See, for example Horsch et al.(1984) Science 233: 496-498, and Fraley et al. (1983) Proc. Natl. Acad.Sci. 80:4803. For instance, Agrobacterium transformation of maize isdescribed in U.S. Pat. Nos. 5,550,318 and 5,981,840.

Other methods of transformation include (1) Agrobacteriumrhizogenes-mediated transformation (see, e.g., Lichtenstein and FullerIn: Genetic Engineering, Vol. 6, P W J Rigby, ed., London, AcademicPress, 1987; Lichtenstein, C. P., and Draper, J, In: DNA Cloning, Vol.II, D. M. Glover, Ed., Oxford, IRI Press, 1985;WO 88/02405 describes theuse of A. rhizogenes strain A4 and its Ri plasmid along with A.tumefaciens vectors pARC8 or pARC16); (2) liposome-mediated DNA uptake(see, e.g., Freeman et al. (1984) Plant Cell Physiol. 25:1353; (3) thevortexing method (see, e.g., Kindle (1990) Proc. Natl. Acad. Sci. USA87:1228.

DNA can also be introduced into plants by direct DNA transfer intopollen as described by Zhou et al. (1983) Methods in Enzymology 101:433;D. Hess (1987) Intern Rev. Cytol. 107:367; and Luo et al. (1988) PlantMol. Biol. Reporter 6:165. Expression of polypeptide coding nucleicacids can be obtained by injection of the DNA into reproductive organsof a plant as described by Pena et al. (1987) Nature 325:274. DNA canalso be injected directly into the cells of immature embryos and therehydration of desiccated embryos as described by Neuhaus et al. (1987)Theor. Appl. Genet. 75: 30; and Benbrook et al. (1986) in ProceedingsBio Expo 1986, Butterworth, Stoneham, Mass., pp. 27-54.

Animal and lower eukaryotic (e.g., yeast) host cells are competent orrendered competent for transfection by various means. There are severalwell-known methods of introducing DNA into animal cells. These methodsinclude: calcium phosphate precipitation; fusion of the recipient cellswith bacterial protoplasts containing the DNA; treatment of therecipient cells with liposomes containing the DNA; DEAE dextran;electroporation; biolistics; and micro-injection of the DNA directlyinto the cells. The transfected cells are cultured by means well knownin the art. See, Kuchler, R. J. (1977) Biochemical Methods in CellCulture and Virology (Dowden, Hutchinson and Ross, Inc.). As usedherein, the term “transformation” means alteration of the genotype of ahost plant by the introduction of a nucleic acid sequence, e.g., a“heterologous” or “foreign” nucleic acid sequence. The heterologousnucleic acid sequence need not necessarily originate from a differentsource but it will, at some point, have been external to the cell intowhich is introduced.

In addition to Berger, Ausubel and Sambrook, useful general referencesfor plant cell cloning, culture and regeneration include Jones, ed.(1995) Plant Gene Transfer and Expression Protocols—Methods in MolecularBiology, volume 49 (Humana Press, Towata, N.J.); Payne et al. (1992)Plant Cell and Tissue Culture in Liquid Systems (John Wiley & Sons, Inc.New York, N.Y.) (“Payne”); and Gamborg and Phillips, eds. (1995) PlantCell, Tissue and Organ Culture; Fundamental Methods/Springer Lab Manual,(Springer-Verlag, Berlin) (“Gamborg”). A variety of cell culture mediaare described in Atlas and Parks, eds. The Handbook of MicrobiologicalMedia (CRC Press, Boca Raton, Fla.) (“Atlas”). Additional informationfor plant cell culture is found in available commercial literature suchas the Life Science Research Cell Culture Catalogue (1998) fromSigma-Aldrich, Inc. (St Louis, Mo.) (Sigma-LSRCCC) and, e.g., the PlantCulture Catalogue and supplement (1997) also from Sigma-Aldrich, Inc.(St Louis, Mo.) (Sigma-PCCS). Additional details regarding plant cellculture are found in Croy, ed. (1993) Plant Molecular Biology (BiosScientific Publishers, Oxford, UK).

In an embodiment of this invention, recombinant vectors including one ormore GAT polynucleotides, suitable for the transformation of plant cellsare prepared. A DNA sequence encoding for the desired GAT polypeptide,e.g., selected from among SEQ ID NO: 516, 517, 518, 519, 520, 521, 522,523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536,537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550,551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564,565, 566, 567, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640,642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 664, 666, 668,670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696,698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724,726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748, 750, 752,754, 756, 758, 760, 762, 764, 768, 770, 772, 774, 776, 778, 780, 782,784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810,812, 814, 816, 818, 820, 822, 824, 832, 834, 836, 838, 840, 842, 844,846, 848, 850, 852, 854, 856, 858, 860, 862, 864, 866, 868, 870, 872,874, 876, 878, 880, 882, 884, 886, 888, 890, 892, 894, 896, 898, 900,902, 904, 906, 908, 910, 912, 914, 916, 918, 920, 922, 924, 926, 928,930, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944,945, 947, 949, 951, and 952, is conveniently used to construct arecombinant expression cassette which can be introduced into the desiredplant. In the context of the present invention, an expression cassettewill typically comprise a selected GAT polynucleotide operably linked toa promoter sequence and other transcriptional and translationalinitiation regulatory sequences which are sufficient to direct thetranscription of the GAT sequence in the intended tissues (e.g., entireplant, leaves, roots, etc.) of the transformed plant.

A number of promoters can be used in the practice of the presentinvention. The promoters can be selected based on the desired outcome.That is, the nucleic acids can be combined with constitutive,tissue-preferred, or other promoters for expression in plants.

Constitutive promoters include, for example, the core promoter of theRsyn7 promoter and other constitutive promoters disclosed in WO 99/43838and U.S. Pat. No. 6,072,050; the core CaMV 35S promoter (Odell et al.(1985) Nature 313:810-812); rice actin (McElroy et al. (1990) Plant Cell2:163-171); ubiquitin (Christensen et al. (1989) Plant Mol. Biol.12:619-632 and Christensen et al. (1992) Plant Mol. Biol. 18:675-689);pEMU (Last et al. (1991) Theor. Appl. Genet. 81:581-588); MAS (Velten etal. (1984) EMBO J. 3:2723-2730); ALS promoter (U.S. Pat. No. 5,659,026),and the like. Other constitutive promoters include, for example, thosedisclosed in U.S. Pat. Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597;5,466,785; 5,399,680; 5,268,463; 5,608,142; and 6,177,611.

Chemical-regulated promoters can be used to modulate the expression of agene in a plant through the application of an exogenous chemicalregulator. Depending upon the objective, the promoter may be achemical-inducible promoter, where application of the chemical inducesgene expression, or a chemical-repressible promoter, where applicationof the chemical represses gene expression. Chemical-inducible promotersare known in the art and include, but are not limited to, the maizeTn2-2 promoter, which is activated by benzene sulfonamide herbicidesafeners; the maize GST promoter, which is activated by hydrophobicelectrophilic compounds that are used as pre-emergent herbicides; andthe tobacco PR-1a promoter, which is activated by salicylic acid. Otherchemical-regulated promoters of interest include steroid-responsivepromoters. See, for example, the glucocorticoid-inducible promoter inSchena et al. (1991) Proc. Natl. Acad. Sci. USA 88:10421-10425 andMcNellis et al. (1998) Plant J. 14(2):247-257 and thetetracycline-inducible and tetracycline-repressible promoters forexample, Gatz et al. (1991) Mol. Gen. Genet. 227:229-237, and U.S. Pat.Nos. 5,814,618 and 5,789,156, herein incorporated by reference.

Tissue-preferred promoters can also be utilized to target GAT expressionwithin a particular plant tissue. Tissue-preferred promoters includethose disclosed in Yamamoto et al. (1997) Plant J. 12(2):255-265;Kawamata et al. (1997) Plant Cell Physiol. 38(7):792-803; Hansen et al.(1997) Mol. Gen Genet. 254(3):337-343; Russell et al. (1997) TransgenicRes. 6(2):157-168; Rinehart et al. (1996) Plant Physiol.112(3):1331-1341; Van Camp et al. (1996) Plant Physiol. 112(2):525-535;Canevascini et al. (1996) Plant Physiol. 112(2):513-524; Yamamoto et al.(1994) Plant Cell Physiol. 35(5):773-778; Lam (1994) Results Probl. CellDiffer. 20:181-196; Orozco et al. (1993) Plant Mol. Biol.23(6):1129-1138; Matsuoka et al. (1993) Proc Natl. Acad. Sci. USA90(20):9586-9590; and Guevara-Garcia et al. (1993) Plant J.4(3):495-505. Such promoters can be modified, if necessary, for weakexpression.

Leaf-specific promoters are known in the art. See, for example, Yamamotoet al. (1997) Plant J. 12(2):255-265; Kwon et al. (1994) Plant Physiol.105:357-67; Yamamoto et al. (1994) Plant Cell Physiol. 35(5):773-778;Gotor et al. (1993) Plant J. 3:509-18; Orozco et al. (1993) Plant Mol.Biol. 23(6):1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci.USA 90(20):9586-9590.

Root-preferred promoters are known and can be selected from the manyavailable from the literature or isolated de novo from variouscompatible species. See, for example, Hire et al. (1992) Plant Mol.Biol. 20(2):207-218 (soybean root-specific glutamine synthetase gene);Keller et al. (1991) Plant Cell 3(10):1051-1061 (root-specific controlelement in the GRP 1. 8 gene of French bean); Sanger et al. (1990) PlantMol. Biol. 14(3):433-443 (root-specific promoter of the mannopinesynthase (MAS) gene of Agrobacterium tumefaciens); and Miao et al.(1991) Plant Cell 3(1):11-22 (full-length cDNA clone encoding cytosolicglutamine synthetase (GS), which is expressed in roots and root nodulesof soybean). See also Bogusz et al. (1990) Plant Cell 2(7):633-641,which discloses two root-specific promoters isolated from hemoglobingenes from the nitrogen-fixing nonlegume Parasponia andersonii and therelated non-nitrogen-fixing nonlegume Trema tomentosa. The promoters ofthese genes were linked to a β-glucuronidase reporter gene andintroduced into both the nonlegume Nicotiana tabacum and the legumeLotus corniculatus, and in both instances root-specific promoteractivity was preserved. Leach et al. (1991) describe their analysis ofthe promoters of the highly expressed rolC and rolD root-inducing genesof Agrobacterium rhizogenes (see Plant Science (Limerick) 79(1):69-76).They concluded that enhancer and tissue-preferred DNA determinants aredissociated in those promoters. Teeri et al. (1989) EMBO J. 8(2):343-350used gene fusion to lacZ to show that the Agrobacterium T-DNA geneencoding octopine synthase is especially active in the epidermis of theroot tip and that the TR2′ gene is root specific in the intact plant andstimulated by wounding in leaf tissue, which is an especially desirablecombination of characteristics for use with an insecticidal orlarvicidal gene. The TR1′ gene, fused to nptII (neomycinphosphotransferase II), showed similar characteristics. Additionalroot-preferred promoters include the VfENOD-GRP3 gene promoter (Kusteret al. (1995) Plant Mol. Biol. 29(4):759-772); the ZRP2 promoter (U.S.Pat. No. 5,633,636); the IFS1 promoter (U.S. patent application Ser. No.10/104,706) and the rolB promoter (Capana et al. (1994) Plant Mol. Biol.25(4):681-691). See also U.S. Pat. Nos. 5,837,876; 5,750,386; 5,459,252;5,401,836; 5,110,732; and 5,023,179.

“Seed-preferred” promoters include both “seed-specific” promoters (thosepromoters active during seed development such as promoters of seedstorage proteins) as well as “seed-germinating” promoters (thosepromoters active during seed germination). See Thompson et al. (1989)BioEssays 10: 108, herein incorporated by reference. Such seed-preferredpromoters include, but are not limited to, Cim1 (cytokinin-inducedmessage); cZ19B1 (maize 19 kDa zein); mi1ps (myo-inositol-1-phosphatesynthase); and celA (cellulose synthase) (see U.S. Pat. No. 6,225,529,herein incorporated by reference). Gamma-zein is an endosperm-specificpromoter. Glob-1 is an embryo-specific promoter. For dicots,seed-specific promoters include, but are not limited to, beanβ-phaseolin, napin, β-conglycinin, soybean lectin, cruciferin, and thelike. For monocots, seed-specific promoters include, but are not limitedto, maize 15 kDa zein, 22 kDa zein, 27 kDa zein, g-zein, waxy, shrunken1, shrunken 2, globulin 1, etc. See also WO 00/12733, which disclosesseed-preferred promoters from end1 and end2 genes; herein incorporatedby reference.

In particular, a strongly or weakly constitutive plant promoter thatdirects expression of a GAT nucleic acid in all tissues of a plant canbe favorably employed. Such promoters are active under mostenvironmental conditions and states of development or celldifferentiation. In addition to the promoters mentioned above examplesof constitutive promoters include the 1′- or 2′-promoter ofAgrobacterium tumefaciens, and other transcription initiation regionsfrom various plant genes known to those of skill. Where over expressionof a GAT polypeptide of the invention is detrimental to the plant, oneof skill will recognize that weak constitutive promoters can be used forlow-levels of expression. Generally, by “weak promoter” a promoter thatdrives expression of a coding sequence at a low level is intended. By“low level” levels from about 1/1000 transcripts to about 1/100,000transcripts, to about as low as 1/500,000 transcripts per cell areintended. Alternatively, it is recognized that weak promoters alsoinclude promoters that are expressed in only a few cells and not inothers to give a total low level of expression. Where a promoter isexpressed at unacceptably high levels, portions of the promoter sequencecan be deleted or modified to decrease expression levels. In those caseswhere high levels of expression is not harmful to the plant, a strongpromoter, e.g., a t-RNA, or other pol III promoter, or a strong pol IIpromoter, (e.g., the cauliflower mosaic virus promoter, CaMV, 35Spromoter) can be used.

Alternatively, a plant promoter can be under environmental control. Suchpromoters are referred to as “inducible” promoters. Examples ofenvironmental conditions that may alter transcription by induciblepromoters include pathogen attack, anaerobic conditions, or the presenceof light. In some cases, it is desirable to use promoters that are“tissue-specific” and/or are under developmental control such that theGAT polynucleotide is expressed only in certain tissues or stages ofdevelopment, e.g., leaves, roots, shoots, etc. Endogenous promoters ofgenes related to herbicide tolerance and related phenotypes areparticularly useful for driving expression of GAT nucleic acids, e.g.,P450 monooxygenases, glutathione-S-transferases,homoglutathione-S-transferases, glyphosate oxidases and5-enolpyruvylshikimate-2-phosphate synthases.

Tissue specific promoters can also be used to direct expression ofheterologous structural genes, including the GAT polynucleotidesdescribed herein. Thus the promoters can be used in recombinantexpression cassettes to drive expression of any gene whose expression isdesirable in the transgenic plants of the invention, e.g., GAT and/orother genes conferring herbicide resistance or tolerance, genes whichinfluence other useful characteristics, e.g., heterosis. Similarly,enhancer elements, e.g., derived from the 5′ regulatory sequences orintron of a heterologous gene, can also be used to improve expression ofa heterologous structural gene, such as a GAT polynucleotide.

In general, the particular promoter used in the expression cassette inplants depends on the intended application. Any of a number of promoterswhich direct transcription in plant cells can be suitable. The promotercan be either constitutive or inducible. In addition to the promotersnoted above, promoters of bacterial origin which operate in plantsinclude the octopine synthase promoter, the nopaline synthase promoterand other promoters derived from Ti plasmids. See, Herrera-Estrella etal. (1983) Nature 303:209. Viral promoters include the 35S and 19S RNApromoters of CaMV. See, Odell et al. (1985) Nature 313:810. Other plantpromoters include the ribulose-1,3-bisphosphate carboxylase smallsubunit promoter and the phaseolin promoter. The promoter sequence fromthe E8 gene (see, Deikman and Fischer (1988) EMBO J 7:3315) and othergenes are also favorably used. Promoters specific for monocotyledonousspecies are also considered (McElroy and Brettell (1994) “Foreign geneexpression in transgenic cereals” Trends Biotech. 12:62-68.)Alternatively, novel promoters with useful characteristics can beidentified from any viral, bacterial, or plant source by methods,including sequence analysis, enhancer or promoter trapping, and thelike, known in the art.

In preparing expression vectors of the invention, sequences other thanthe promoter and the GAT encoding gene are also favorably used. Ifproper polypeptide expression is desired, a polyadenylation region canbe derived from the natural gene, from a variety of other plant genes,or from T-DNA. Signal/localization peptides, which, e.g., facilitatetranslocation of the expressed polypeptide to internal organelles (e.g.,chloroplasts) or extracellular secretion, can also be employed.

The vector comprising the GAT polynucleotide also can include a markergene which confers a selectable phenotype on plant cells. For example,the marker may encode biocide tolerance, particularly antibiotictolerance, such as tolerance to kanamycin, G418, bleomycin, hygromycin,or herbicide tolerance, such as tolerance to chlorosulfuron, orphosphinothricin. Reporter genes, which are used to monitor geneexpression and protein localization via visualizable reaction products(e.g., beta-glucuronidase, beta-galactosidase, and chloramphenicolacetyltransferase) or by direct visualization of the gene product itself(e.g., green fluorescent protein, GFP; Sheen et al. (1995) The PlantJournal 8:777) can be used for, e.g., monitoring transient geneexpression in plant cells. Transient expression systems can be employedin plant cells, for example, in screening plant cell cultures forherbicide tolerance activities.

Plant Transformation

Protoplasts

Numerous protocols for establishment of transformable protoplasts from avariety of plant types and subsequent transformation of the culturedprotoplasts are available in the art and are incorporated herein byreference. For examples, see, Hashimoto et al. (1990) Plant Physiol. 93:857; Fowke and Constabel, eds.(1994) Plant Protoplasts; Saunders et al.(1993) Applications of Plant In vitro Technology Symposium, UPM 16-18;and Lyznik et al. (1991) BioTechniques 10:295, each of which isincorporated herein by reference.

Chloroplasts

Chloroplasts are a site of action of some herbicide toleranceactivities, and, in some instances, the GAT polynucleotide is fused to achloroplast transit sequence peptide to facilitate translocation of thegene products into the chloroplasts. In these cases, it can beadvantageous to transform the GAT polynucleotide into the chloroplastsof the plant host cells. Numerous methods are available in the art toaccomplish chloroplast transformation and expression (e.g., Daniell etal. (1998) Nature Biotech. 16:346; O'Neill et al. (1993) The PlantJournal 3:729; and Maliga (1993) TIBTECH 11:1). The expression constructcomprises a transcriptional regulatory sequence functional in plantsoperably linked to a polynucleotide encoding the GAT polypeptide.Expression cassettes that are designed to function in chloroplasts (suchas an expression cassette including a GAT polynucleotide) include thesequences necessary to ensure expression in chloroplasts. Typically, thecoding sequence is flanked by two regions of homology to thechloroplastid genome to effect a homologous recombination with thechloroplast genome; often a selectable marker gene is also presentwithin the flanking plastid DNA sequences to facilitate selection ofgenetically stable transformed chloroplasts in the resultanttransplastonic plant cells (see, e.g., Maliga (1993) and Daniell (1998)supra, and references cited therein).

General Transformation Methods

DNA constructs of the invention can be introduced into the genome of thedesired plant host by a variety of conventional techniques. Techniquesfor transforming a wide variety of higher plant species are well knownand described in the technical and scientific literature. See, e.g.,Payne, Gamborg, Croy, Jones, etc. all supra, as well as, e.g., Weisinget al. (1988) Ann. Rev. Genet. 22:421 and U.S. Pat. Nos. 5,889,191,5,889,190, 5,866,785, 5,589,367 and 5,316,931, herein incorporated byreference.

A variety of other transformation protocols are contemplated in thepresent invention. Transformation protocols as well as protocols forintroducing nucleotide sequences into plants may vary depending on thetype of plant or plant cell, i.e., monocot or dicot, targeted fortransformation. Suitable methods of introducing nucleotide sequencesinto plant cells and subsequent insertion into the plant genome includemicroinjection (Crossway et al. (1986) Biotechniques 4:320-334),electroporation (Riggs et al. (1986) Proc. Natl. Acad. Sci. USA83:5602-5606), Agrobacterium-mediated transformation (U.S. Pat. Nos.5,563,055 and 5,981,840), direct gene transfer (Paszkowski et al. (1984)EMBO J. 3:2717-2722), and ballistic particle acceleration (see, forexample, U.S. Pat. Nos. 4,945,050; U.S. Pat. Nos. 5,879,918; 5,886,244;5,932,782; Tomes et al. (1995) “Direct DNA Transfer into Intact PlantCells via Microprojectile Bombardment,” in Plant Cell, Tissue, and OrganCulture: Fundamental Methods, Eds., Gamborg and Phillips(Springer-Verlag, Berlin); McCabe et al. (1988) Biotechnology6:923-926); and Lec1 transformation (WO 00/28058). See also, Weissingeret al. (1988) Ann. Rev. Genet. 22:421-477; Sanford et al. (1987)Particulate Science and Technology 5:27-37 (onion); Christou et al.(1988) Plant Physiol. 87:671-674 (soybean); McCabe et al. (1988)Bio/Technology 6:923-926 (soybean); Finer and McMullen (1991) In vitroCell Dev. Biol. 27P:175-182 (soybean); Singh et al. (1998) Theor. Appl.Genet. 96:319-324 (soybean); Datta et al. (1990) Biotechnology 8:736-740(rice); Klein et al. (1988) Proc. Natl. Acad. Sci. USA 85:4305-4309(maize); Klein et al. (1988) Biotechnology 6:559-563 (maize); U.S. Pat.Nos. 5,240,855; 5,322,783 and 5,324,646; Klein et al. (1988) PlantPhysiol. 91:440-444 (maize); Fromm et al. (1990) Biotechnology 8:833-839(maize); Hooykaas-Van Slogteren et al. (1984) Nature (London)311:763-764; U.S. Pat. No. 5,736,369 (cereals); Bytebier et al. (1987)Proc. Natl. Acad. Sci. USA 84:5345-5349 (Liliaceae); De Wet et al.(1985) in The Experimental Manipulation of Ovule Tissues, Eds., Chapmanet al. (Longman, New York), pp. 197-209 (pollen); Kaeppler et al. (1990)Plant Cell Reports 9:415-418 and Kaeppler et al. (1992) Theor. Appl.Genet. 84:560-566 (whisker-mediated transformation); D'Halluin et al.(1992) Plant Cell 4:1495-1505 (electroporation); Li et al. (1993) PlantCell Reports 12:250-255 and Christou and Ford (1995) Annals of Botany75:407-413 (rice); Osjoda et al. (1996) Nature Biotechnology 14:745-750(maize via Agrobacterium tumefaciens); all of which are hereinincorporated by reference.

For example, DNAs can be introduced directly into the genomic DNA of aplant cell using techniques such as electroporation and microinjectionof plant cell protoplasts, or the DNA constructs can be introduceddirectly to plant tissue using ballistic methods, such as DNA particlebombardment. Alternatively, the DNA constructs can be combined withsuitable T-DNA flanking regions and introduced into a conventionalAgrobacterium tumefaciens host vector. The virulence functions of theAgrobacterium host will direct the insertion of the construct andadjacent marker into the plant cell DNA when the plant cell is infectedby the bacteria.

Microinjection techniques are known in the art and well described in thescientific and patent literature. The introduction of DNA constructsusing polyethylene glycol precipitation is described in Paszkowski et al(1984) EMBO J. 3:2717. Electroporation techniques are described in Frommet al. (1985) Proc Nat'l Acad Sci USA 82:5824. Ballistic transformationtechniques are described in Klein et al. (1987) Nature 327:70; and Weekset al. Plant Physiol 102:1077.

In some embodiments, Agrobacterium mediated transformation techniquesare used to transfer the GAT sequences of the invention to transgenicplants. Agrobacterium-mediated transformation is widely used for thetransformation of dicots, however, certain monocots can also betransformed by Agrobacterium. For example, Agrobacterium transformationof rice is described by Hiei et al. (1994) Plant J. 6:271; U.S. Pat. No.5,187,073; U.S. Pat. No. 5,591,616; Li et al. (1991) Science in China34:54; and Raineri et al. (1990) Bio/Technology 8: 33. Transformedmaize, barley, triticale and asparagus by Agrobacterium mediatedtransformation have also been described (Xu et al. (1990) Chinese J Bot2:81).

Agrobacterium mediated transformation techniques take advantage of theability of the tumor-inducing (Ti) plasmid of A. tumefaciens tointegrate into a plant cell genome, to co-transfer a nucleic acid ofinterest into a plant cell. Typically, an expression vector is producedwherein the nucleic acid of interest, such as a GAT polynucleotide ofthe invention, is ligated into an autonomously replicating plasmid whichalso contains T-DNA sequences. T-DNA sequences typically flank theexpression cassette nucleic acid of interest and comprise theintegration sequences of the plasmid. In addition to the expressioncassette, T-DNA also typically includes a marker sequence, e.g.,antibiotic resistance genes. The plasmid with the T-DNA and theexpression cassette are then transfected into Agrobacterium cells.Typically, for effective transformation of plant cells, the A.tumefaciens bacterium also possesses the necessary vir regions on aplasmid, or integrated into its chromosome. For a discussion ofAgrobacterium mediated transformation, see, Firoozabady and Kuehnle,(1995) in Plant Cell Tissue and Organ Culture Fundamental Methods, eds.Gamborg and Phillips.

In certain embodiments the polynucleotides of the present invention canbe stacked with any combination of polynucleotide sequences of interestin order to create plants with a desired phenotype. For example, thepolynucleotides of the present invention may be stacked with any otherpolynucleotides encoding polypeptides having pesticidal and/orinsecticidal activity, such as Bacillus thuringiensis toxic proteins(described in U.S. Pat. Nos. 5,366,892; 5,747,450; 5,737,514; 5,723,756;5,593,881; and Geiser et al. (1986) Gene 48:109), lectins (Van Damme etal. (1994) Plant Mol. Biol. 24:825, pentin (described in U.S. Pat. No.5,981,722), and the like. The combinations generated can also includemultiple copies of any one of the polynucleotides of interest. Thepolynucleotides of the present invention can also be stacked with anyother gene or combination of genes to produce plants with a variety ofdesired trait combinations including, but not limited to, traitsdesirable for animal feed such as high oil genes (e.g., U.S. Pat. No.6,232,529); balanced amino acids (e.g., hordothionins (U.S. Pat. Nos.5,990,389; 5,885,801; 5,885,802; and 5,703,409); barley high lysine(Williamson et al. (1987) Eur. J. Biochem. 165:99-106; and WO 98/20122)and high methionine proteins (Pedersen et al. (1986) J. Biol. Chem.261:6279; Kirihara et al. (1988) Gene 71:359; and Musumura et al. (1989)Plant Mol. Biol. 12:123)); increased digestibility (e.g., modifiedstorage proteins (U.S. application Ser. No. 10/053,410, filed Nov. 7,2001); and thioredoxins (U.S. application Ser. No. 10/005,429, filedDec. 3, 2001)); the disclosures of which are herein incorporated byreference.

The polynucleotides of the present invention can also be stacked withtraits desirable for disease or herbicide resistance (e.g., fumonisindetoxification genes (U.S. Pat. No. 5,792,931); avirulence and diseaseresistance genes (Jones et al. (1994) Science 266:789; Martin et al.(1993) Science 262:1432; Mindrinos et al. (1994) Cell 78:1089);acetolactate synthase (ALS) mutants that lead to herbicide resistancesuch as the S4 and/or Hra mutations; inhibitors of glutamine synthasesuch as phosphinothricin or basta (e.g., bar gene); and glyphosateresistance (EPSPS gene)); and traits desirable for processing or processproducts such as high oil (e.g., U.S. Pat. No. 6,232,529); modified oils(e.g., fatty acid desaturase genes (U.S. Pat. No. 5,952,544; WO94/11516)); modified starches (e.g., ADPG pyrophosphorylases (AGPase),starch synthases (SS), starch branching enzymes (SBE), and starchdebranching enzymes (SDBE)); and polymers or bioplastics (e.g., U.S.Pat. No. 5,602,321; beta-ketothiolase, polyhydroxybutyrate synthase, andacetoacetyl-CoA reductase (Schubert et al. (1988) J. Bacteriol.170:5837-5847) facilitate expression of polyhydroxyalkanoates (PHAs));the disclosures of which are herein incorporated by reference. One couldalso combine the polynucleotides of the present invention withpolynucleotides providing agronomic traits such as male sterility (e.g.,see U.S. Pat. No. 5,583,210), stalk strength, flowering time, ortransformation technology traits such as cell cycle regulation or genetargeting (e.g., WO 99/61619, WO 00/17364, and WO 99/25821); thedisclosures of which are herein incorporated by reference.

These stacked combinations can be created by any method including, butnot limited to, cross-breeding plants by any conventional or TopCrossmethodology, or genetic transformation. If the traits are stacked bygenetically transforming the plants, the polynucleotide sequences ofinterest can be combined at any time and in any order. For example, atransgenic plant comprising one or more desired traits can be used asthe target to introduce further traits by subsequent transformation. Thetraits can be introduced simultaneously in a co-transformation protocolwith the polynucleotides of interest provided by any combination oftransformation cassettes. For example, if two sequences will beintroduced, the two sequences can be contained in separatetransformation cassettes (trans) or contained on the same transformationcassette (cis). Expression of the sequences can be driven by the samepromoter or by different promoters. In certain cases, it may bedesirable to introduce a transformation cassette that will suppress theexpression of the polynucleotide of interest. This may be combined withany combination of other suppression cassettes or overexpressioncassettes to generate the desired combination of traits in the plant. Itis further recognized that polynucleotide sequences can be stacked at adesired genomic location using a site-specific recombination system.See, for example, WO99/25821, WO99/25854, WO99/25840, WO99/25855, andWO99/25853, all of which are herein incorporated by reference.

Regeneration of Transgenic Plants

Transformed plant cells which are derived by plant transformationtechniques, including those discussed above, can be cultured toregenerate a whole plant which possesses the transformed genotype (i.e.,a GAT polynucleotide), and thus the desired phenotype, such as acquiredresistance (i.e., tolerance) to glyphosate or a glyphosate analog. Suchregeneration techniques rely on manipulation of certain phytohormones ina tissue culture growth medium, typically relying on a biocide and/orherbicide marker which has been introduced together with the desirednucleotide sequences. For transformation and regeneration of maize see,Gordon-Kamm et al., The Plant Cell, 2:603-618 (1990). Alternatively,selection for glyphosate resistance conferred by the GAT polynucleotideof the invention can be performed. Plant regeneration from culturedprotoplasts is described in Evans et al. (1983) Protoplasts Isolationand Culture, Handbook of Plant Cell Culture, pp 124-176, MacmillanPublishing Company, New York; and Binding (1985) Regeneration of Plants,Plant Protoplasts pp 21-73, CRC Press, Boca Raton. Regeneration can alsobe obtained from plant callus, explants, organs, or parts thereof. Suchregeneration techniques are described generally in Klee et al. (1987)Ann Rev of Plant Phys 38:467. See also, e.g., Payne and Gamborg.

Transformed plant cells, calli or explant can be cultured onregeneration medium in the dark for several weeks, generally about 1 to3 weeks to allow the somatic embryos to mature. Preferred regenerationmedia include media containing MS salts. The plant cells, calli orexplant are then typically cultured on rooting medium in a light/darkcycle until shoots and roots develop. Methods for plant regeneration areknown in the art and preferred methods are provided by Kamo et al.,(Bot. Gaz. 146(3):324-334, 1985); West et al., (The Plant Cell5:1361-1369, 1993); and Duncan et al. (Planta 165:322-332, 1985).

Small plantlets can then be transferred to tubes containing rootingmedium and allowed to grow and develop more roots for approximatelyanother week. The plants can then be transplanted to soil mixture inpots in the greenhouse.

The regeneration of plants containing the foreign gene introduced byAgrobacterium can be achieved as described by Horsch et al., Science,227:1229-1231 (1985) and Fraley et al., Proc. Natl. Acad. Sci. U.S.A.,80:4803 (1983). This procedure typically produces shoots within two tofour weeks and these transformant shoots are then transferred to anappropriate root-inducing medium containing the selective agent and anantibiotic to prevent bacterial growth. Transgenic plants of the presentinvention may be fertile or sterile.

Regeneration can also be obtained from plant callus, explants, organs,or parts thereof. Such regeneration techniques are described generallyin Klee et al., Ann. Rev. of Plant Phys. 38:467-486 (1987). Theregeneration of plants from either single plant protoplasts or variousexplants is well known in the art. See, for example, Methods for PlantMolecular Biology, A. Weissbach and H. Weissbach, eds., Academic Press,Inc., San Diego, Calif. (1988). For maize cell culture and regenerationsee generally, The Maize Handbook, Freeling and Walbot, eds., Springer,New York (1994); Corn and Corn Improvement, 3^(rd) Ed., Sprague andDudley eds., American Society of Agronomy, Madison, Wis. (1988).

After transformation with Agrobacterium, the explants typically aretransferred to selection medium. One of skill will realize that theselection medium depends on the selectable marker that wasco-transfected into the explants. After a suitable length of time,transformants will begin to form shoots. After the shoots are about 1-2cm in length, the shoots should be transferred to a suitable root andshoot medium. Selection pressure should be maintained in the root andshoot medium.

Typically, the transformants will develop roots in about 1-2 weeks andform plantlets. After the plantlets are about 3-5 cm in height, they areplaced in sterile soil in fiber pots. Those of skill in the art willrealize that different acclimation procedures are used to obtaintransformed plants of different species. For example, after developing aroot and shoot, cuttings, as well as somatic embryos of transformedplants, are transferred to medium for establishment of plantlets. For adescription of selection and regeneration of transformed plants, see,e.g., Dodds and Roberts (1995) Experiments in Plant Tissue Culture,3^(rd) Ed., Cambridge University Press.

There are also methods for Agrobacterium transformation of Arabidopsisusing vacuum infiltration (Bechtold N., Ellis J. and Pelletier G, 1993,In planta Agrobacterium mediated gene transfer by infiltration of adultArabidopsis thaliana plants. CR Acad Sci Paris Life Sci 316:1194-1199)and simple dipping of flowering plants (Desfeux, C., Clough S. J., andBent A. F., 2000, Female reproductive tissues are the primary target ofAgrobacterium-mediated transformation by the Arabidopsis floral-dipmethod. Plant Physiol. 123:895-904). Using these methods, transgenicseed are produced without the need for tissue culture.

There are plant varieties for which effective Agrobacterium-mediatedtransformation protocols have yet to be developed. For example,successful tissue transformation coupled with regeneration of thetransformed tissue to produce a transgenic plant has not been reportedfor some of the most commercially relevant cotton cultivars.Nevertheless, an approach that can be used with these plants involvesstably introducing the polynucleotide into a related plant variety viaAgrobacterium-mediated transformation, confirming operability, and thentransferring the transgene to the desired commercial strain usingstandard sexual crossing or back-crossing techniques. For example, inthe case of cotton, Agrobacterium can be used to transform a Coker lineof Gossypium hirustum (e.g., Coker lines 310, 312, 5110 Deltapine 61 orStoneville 213), and then the transgene can be introduced into anothermore commercially relevant G. hirustum cultivar by back-crossing.

The transgenic plants of this invention can be characterized eithergenotypically or phenotypically to determine the presence of the GATpolynucleotide of the invention. Genotypic analysis can be performed byany of a number of well-known techniques, including PCR amplification ofgenomic DNA and hybridization of genomic DNA with specific labeledprobes. Phenotypic analysis includes, e.g., survival of plants or planttissues exposed to a selected herbicide such as glyphosate.

One of skill will recognize that after the expression cassettecontaining the GAT gene is stably incorporated in transgenic plants andconfirmed to be operable, it can be introduced into other plants bysexual crossing. Any of a number of standard breeding techniques can beused, depending upon the species to be crossed.

In vegetatively propagated crops, mature transgenic plants can bepropagated by the taking of cuttings or by tissue culture techniques toproduce multiple identical plants. Selection of desirable transgenics ismade and new varieties are obtained and propagated vegetatively forcommercial use. In seed propagated crops, mature transgenic plants canbe self crossed to produce a homozygous inbred plant. The inbred plantproduces seed containing the newly introduced heterologous nucleic acid.These seeds can be grown to produce plants that would produce theselected phenotype.

Parts obtained from the regenerated plant, such as flowers, seeds,leaves, branches, fruit, and the like are included in the invention,provided that these parts comprise cells comprising the isolated GATnucleic acid. Progeny and variants, and mutants of the regeneratedplants are also included within the scope of the invention, providedthat these parts comprise the introduced nucleic acid sequences.

Transgenic plants expressing a selectable marker can be screened fortransmission of the GAT nucleic acid, for example, by standardimmunoblot and DNA detection techniques. Transgenic lines are alsotypically evaluated on levels of expression of the heterologous nucleicacid. Expression at the RNA level can be determined initially toidentify and quantitate expression-positive plants. Standard techniquesfor RNA analysis can be employed and include PCR amplification assaysusing oligonucleotide primers designed to amplify only the heterologousRNA templates and solution hybridization assays using heterologousnucleic acid-specific probes. The RNA-positive plants can then beanalyzed for protein expression by Western immunoblot analysis using thespecifically reactive antibodies of the present invention. In addition,in situ hybridization and immunocytochemistry according to standardprotocols can be done using heterologous nucleic acid specificpolynucleotide probes and antibodies, respectively, to localize sites ofexpression within transgenic tissue. Generally, a number of transgeniclines are usually screened for the incorporated nucleic acid to identifyand select plants with the most appropriate expression profiles.

A preferred embodiment is a transgenic plant that is homozygous for theadded heterologous nucleic acid; i.e., a transgenic plant that containstwo added nucleic acid sequences, one gene at the same locus on eachchromosome of a chromosome pair. A homozygous transgenic plant can beobtained by sexually mating (selfing) a heterozygous transgenic plantthat contains a single added heterologous nucleic acid, germinating someof the seed produced and analyzing the resulting plants produced foraltered cell division relative to a control plant (i.e., native,non-transgenic). Back-crossing to a parental plant and out-crossing witha non-transgenic plant are also contemplated.

Essentially any plant can be transformed with the GAT polynucleotides ofthe invention. Suitable plants for the transformation and expression ofthe novel GAT polynucleotides of this invention include agronomicallyand horticulturally important species. Such species include, but are notrestricted to members of the families: Graminae (including corn, rye,triticale, barley, millet, rice, wheat, oats, etc.); Leguminosae(including pea, beans, lentil, peanut, yam bean, cowpeas, velvet beans,soybean, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria,and sweetpea); Compositae (the largest family of vascular plants,including at least 1,000 genera, including important commercial cropssuch as sunflower); and Rosaciae (including raspberry, apricot, almond,peach, rose, etc.); as well as nut plants (including, walnut, pecan,hazelnut, etc.); and forest trees (including Pinus, Quercus,Pseutotsuga, Sequoia, Populus, etc.)

Additional targets for modification by the GAT polynucleotides of theinvention, as well as those specified above, include plants from thegenera: Agrostis, Allium, Antirrhinum, Apium, Arachis, Asparagus,Atropa, Avena (e.g., oats), Bambusa, Brassica, Bromus, Browaalia,Camellia, Cannabis, Capsicum, Cicer, Chenopodium, Chichorium, Citrus,Coffea, Coix, Cucumis, Curcubita, Cynodon, Dactylis, Datura, Daucus,Digitalis, Dioscorea, Elaeis, Eleusine, Festuca, Fragaria, Geranium,Gossypium, Glycine, Helianthus, Heterocallis, Hevea, Hordeum (e.g.,barley), Hyoscyamus, Ipomoea, Lactuca, Lens, Lilium, Linum, Lolium,Lotus, Lycopersicon, Majorana, Malus, Mangifera, Manihot, Medicago,Nemesia, Nicotiana, Onobrychis, Oryza (e.g., rice), Panicum,Pelargonium, Pennisetum (e.g., millet), Petunia, Pisum, Phaseolus,Phleum, Poa, Prunus, Ranunculus, Raphanus, Ribes, Ricinus, Rubus,Saccharum, Salpiglossis, Secale (e.g., rye), Senecio, Setaria, Sinapis,Solanum, Sorghum, Stenotaphrum, Theobroma, Trifolium, Trigonella,Triticum (e.g., wheat), Vicia, Vigna, Vitis, Zea (e.g., corn), and theOlyreae, the Pharoideae and many others. As noted, plants in the familyGraminae are particularly desirable target plants for the methods of theinvention.

Common crop plants which are targets of the present invention includecorn, rice, triticale, rye, cotton, soybean, sorghum, wheat, oats,barley, millet, sunflower, canola, peas, beans, lentils, peanuts, yambeans, cowpeas, velvet beans, clover, alfalfa, lupine, vetch, lotus,sweet clover, wisteria, sweetpea and nut plants (e.g., walnut, pecan,etc).

In one aspect, the invention provides a method for producing a crop bygrowing a crop plant that is glyphosate-tolerant as a result of beingtransformed with a gene encoding a glyphosate N-acetyltransferase, underconditions such that the crop plant produces a crop, and harvesting thecrop. Preferably, glyphosate is applied to the plant, or in the vicinityof the plant, at a concentration effective to control weeds withoutpreventing the transgenic crop plant from growing and producing thecrop. The application of glyphosate can be before planting, or at anytime after planting up to and including the time of harvest. Glyphosatecan be applied once or multiple times. The timing of glyphosateapplication, amount applied, mode of application, and other parameterswill vary based upon the specific nature of the crop plant and thegrowing environment, and can be readily determined by one of skill inthe art. The invention further provides a crop produced by this method.

The invention provides for the propagation of a plant containing a GATpolynucleotide transgene. The plant can be, for example, a monocot or adicot. In one aspect, propagation entails crossing a plant containing aGAT polynucleotide transgene with a second plant, such that at leastsome progeny of the cross display glyphosate tolerance.

In one aspect, the invention provides a method for selectivelycontrolling weeds in a field where a crop is being grown. The methodinvolves planting crop seeds or plants that are glyphosate-tolerant as aresult of being transformed with a gene encoding a GAT, e.g., a GATpolynucleotide, and applying to the crop and any weeds a sufficientamount of glyphosate to control the weeds without a significant adverseimpact on the crop. It is important to note that it is not necessary forthe crop to be totally insensitive to the herbicide, so long as thebenefit derived from the inhibition of weeds outweighs any negativeimpact of the glyphosate or glyphosate analog on the crop or crop plant.

In another aspect, the invention provides for use of a GATpolynucleotide as a selectable marker gene. In this embodiment of theinvention, the presence of the GAT polynucleotide in a cell or organismconfers upon the cell or organism the detectable phenotypic trait ofglyphosate resistance, thereby allowing one to select for cells ororganisms that have been transformed with a gene of interest linked tothe GAT polynucleotide. Thus, for example, the GAT polynucleotide can beintroduced into a nucleic acid construct, e.g., a vector, therebyallowing for the identification of a host (e.g., a cell or transgenicplant) containing the nucleic acid construct by growing the host in thepresence of glyphosate and selecting for the ability to survive and/orgrow at a rate that is discernibly greater than a host lacking thenucleic acid construct would survive or grow. A GAT polynucleotide canbe used as a selectable marker in a wide variety of hosts that aresensitive to glyphosate, including plants, most bacteria (including E.coli), actinomycete, yeasts, algae and fungi. One benefit of usingherbicide resistance as a marker in plants, as opposed to conventionalantibiotic resistance, is that it obviates the concern of some membersof the public that antibiotic resistance might escape into theenvironment. Some experimental data from experiments demonstrating theuse of a GAT polynucleotide as a selectable marker in diverse hostsystems are described in the Examples section of this specification.

Selection of GAT Polynucleotides Conferring Enhanced GlyphosateResistance in Transgenic Plants.

Libraries of GAT encoding nucleic acids diversified according to themethods described herein can be selected for the ability to conferresistance to glyphosate in transgenic plants. Following one or morecycles of diversification and selection, the modified GAT genes can beused as a selection marker to facilitate the production and evaluationof transgenic plants and as a means of conferring herbicide resistancein experimental or agricultural plants. For example, afterdiversification of any one or more of, e.g., SEQ ID NO: 1-5 to produce alibrary of diversified GAT polynucleotides, an initial functionalevaluation can be performed by expressing the library of GAT encodingsequences in E. coli. The expressed GAT polypeptides can be purified, orpartially purified as described above, and screened for improvedkinetics by mass spectrometry. Following one or more preliminary roundsof diversification and selection, the polynucleotides encoding improvedGAT polypeptides are cloned into a plant expression vector, operablylinked to, e.g., a strong constitutive promoter, such as the CaMV 35Spromoter. The expression vectors comprising the modified GAT nucleicacids are transformed, typically by Agrobacterium mediatedtransformation, into Arabidopsis thaliana host plants. For example,Arabidopsis hosts are readily transformed by dipping inflorescences intosolutions of Agrobacterium and allowing them to grow and set seed.Thousands of seeds are recovered in approximately 6 weeks. The seeds arethen collected in bulk from the dipped plants and germinated in soil. Inthis manner it is possible to generate several thousand independentlytransformed plants for evaluation, constituting a high throughput (HTP)plant transformation format. Bulk grown seedlings are sprayed withglyphosate and surviving seedlings exhibiting glyphosate resistancesurvive the selection process, whereas non-transgenic plants and plantsincorporating less favorably modified GAT nucleic acids are damaged orkilled by the herbicide treatment. Optionally, the GAT encoding nucleicacids conferring improved resistance to glyphosate are recovered, e.g.,by PCR amplification using T-DNA primers flanking the library inserts,and used in further diversification procedures or to produce additionaltransgenic plants of the same or different species. If desired,additional rounds of diversification and selection can be performedusing increasing concentrations of glyphosate in each subsequentselection. In this manner, GAT polynucleotides and polypeptidesconferring resistance to concentrations of glyphosate useful in fieldconditions can be obtained.

Herbicide Resistance

The present invention provides a composition comprising two or morepolynucleotides of the invention. Preferably, the GAT polynucleotidesencode GAT polypeptides having different kinetic parameters, i.e., a GATvariant having a lower K_(m) can be combined with one having a higherk_(cat). In a further embodiment, the different GAT polynucleotides maybe coupled to a chloroplast transit sequence or other signal sequencethereby providing GAT polypeptide expression in different cellularcompartments, organelles or secretion of one or more of the GATpolypeptides.

The mechanism of glyphosate resistance of the present invention can becombined with other modes of glyphosate resistance known in the art toproduce plants and plant explants with superior glyphosate resistance.For example, glyphosate-tolerant plants can be produced by insertinginto the genome of the plant the capacity to produce a higher level of5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) as more fullydescribed in U.S. Pat. Nos. 6,248,876 B1; 5,627,061; 5,804,425;5,633,435; 5,145,783; 4,971,908; 5,312,910; 5,188,642; 4,940,835;5,866,775; 6,225,114 B1; 6,130,366; 5,310,667; 4,535,060; 4,769,061;5,633,448; 5,510,471; Re. 36,449; RE 37,287 E; and 5,491,288; andinternational publications WO 97/04103; WO 00/66746; WO 01/66704; and WO00/66747, which are incorporated herein by reference in their entiretiesfor all purposes. Glyphosate resistance is also imparted to plants thatexpress a gene that encodes a glyphosate oxido-reductase enzyme asdescribed more fully in U.S. Pat. Nos. 5,776,760 and 5,463,175, whichare incorporated herein by reference in their entireties for allpurposes.

Further, the mechanism of glyphosate resistance of the present inventionmay be combined with other modes of herbicide resistance to provideplants and plant explants that are resistant to glyphosate and one ormore other herbicides. For example, thehydroxyphenylpyruvatedioxygenases are enzymes that catalyze the reactionin which para-hydroxyphenylpyruvate (HPP) is transformed intohomogentisate. Molecules which inhibit this enzyme, and which bind tothe enzyme in order to inhibit transformation of the HPP intohomogentisate are useful as herbicides. Plants more resistant to certainherbicides are described in U.S. Pat. Nos. 6,245,968 B1; 6,268,549; and6,069,115; and international publication WO 99/23886, which areincorporated herein by reference in their entireties for all purposes.

Sulfonylurea and imidazolinone herbicides also inhibit growth of higherplants by blocking acetolactate synthase (ALS) or acetohydroxy acidsynthase (AHAS). The production of sulfonylurea and imidazolinonetolerant plants is described more fully in U.S. Pat. Nos. 5,605,011;5,013,659; 5,141,870; 5,767,361; 5,731,180; 5,304,732; 4,761,373;5,331,107; 5,928,937; and 5,378,824; and international publication WO96/33270, which are incorporated herein by reference in their entiretiesfor all purposes.

Glutamine synthetase (GS) appears to be an essential enzyme necessaryfor the development and life of most plant cells. Inhibitors of GS aretoxic to plant cells. Glufosinate herbicides have been developed basedon the toxic effect due to the inhibition of GS in plants. Theseherbicides are non-selective. They inhibit growth of all the differentspecies of plants present, causing their total destruction. Thedevelopment of plants containing an exogenous phosphinothricinacetyltransferase is described in U.S. Pat. Nos. 5,969,213; 5,489,520;5,550,318; 5,874,265; 5,919,675; 5,561,236; 5,648,477; 5,646,024;6,177,616 B1; and 5,879,903, which are incorporated herein by referencein their entireties for all purposes.

Protoporphyrinogen oxidase (protox) is necessary for the production ofchlorophyll, which is necessary for all plant survival. The protoxenzyme serves as the target for a variety of herbicidal compounds. Theseherbicides also inhibit growth of all the different species of plantspresent, causing their total destruction. The development of plantscontaining altered protox activity which are resistant to theseherbicides are described in U.S. Pat. Nos. 6,288,306 B1; 6,282,837 B1;and 5,767,373; and international publication WO 01/12825, which areincorporated herein by reference in their entireties for all purposes.

Accordingly, the invention provides methods for selectively controllingweeds in a field containing a crop that involve planting the field withcrop seeds or plants which are glyphosate-tolerant as a result of beingtransformed with a gene encoding a glyphosate N-acetyltransferase, andapplying to the crop and weeds in the field a sufficient amount ofglyphosate to control the weeds without significantly affecting thecrop.

The invention further provides methods for controlling weeds in a fieldand preventing the emergence of glyphosate-resistant weeds in a fieldcontaining a crop which involve planting the field with crop seeds orplants that are glyphosate-tolerant as a result of being transformedwith a gene encoding a glyphosate-N-acetyltransferase and a geneencoding a polypeptide imparting glyphosate tolerance by anothermechanism, such as, a glyphosate-tolerant5-enolpyruvylshikimate-3-phosphate synthase and/or a glyphosate-tolerantglyphosate oxido-reductase and applying to the crop and the weeds in thefield a sufficient amount of glyphosate to control the weeds withoutsignificantly affecting the crop.

In a further embodiment the invention provides methods for controllingweeds in a field and preventing the emergence of herbicide resistantweeds in a field containing a crop which involve planting the field withcrop seeds or plants that are glyphosate-tolerant as a result of beingtransformed with a gene encoding a glyphosate-N-acetyltransferase, agene encoding a polypeptide imparting glyphosate tolerance by anothermechanism, such as, a glyphosate-tolerant5-enolpyruvylshikimate-3-phosphate synthase and/or a glyphosate-tolerantglyphosate oxido-reductase and a gene encoding a polypeptide impartingtolerance to an additional herbicide, such as, a mutatedhydroxyphenylpyruvatedioxygenase, a sulfonamide-tolerant acetolactatesynthase, a sulfonamide-tolerant acetohydroxy acid synthase, animidazolinone-tolerant acetolactate synthase, an imidazolinone-tolerantacetohydroxy acid synthase, a phosphinothricin acetyltransferase and amutated protoporphyrinogen oxidase and applying to the crop and theweeds in the field a sufficient amount of glyphosate and an additionalherbicide, such as, a hydroxyphenylpyruvatedioxygenase inhibitor,sulfonamide, imidazolinone, bialaphos, phosphinothricin, azafenidin,butafenacil, sulfosate, glufosinate, and a protox inhibitor to controlthe weeds without significantly affecting the crop.

The invention further provides methods for controlling weeds in a fieldand preventing the emergence of herbicide resistant weeds in a fieldcontaining a crop which involve planting the field with crop seeds orplants that are glyphosate-tolerant as a result of being transformedwith a gene encoding a glyphosate-N-acetyltransferase and a geneencoding a polypeptide imparting tolerance to an additional herbicide,such as, a mutated hydroxyphenylpyruvatedioxygenase, asulfonamide-tolerant acetolactate synthase, a sulfonamide-tolerantacetohydroxy acid synthase, an imidazolinone-tolerant acetolactatesynthase, an imidazolinone-tolerant acetohydroxy acid synthase, aphosphinothricin acetyltransferase and a mutated protoporphyrinogenoxidase and applying to the crop and the weeds in the field a sufficientamount of glyphosate and an additional herbicide, such as, ahydroxyphenylpyruvatedioxygenase inhibitor, sulfonamide, imidazolinone,bialaphos, phosphinothricin, azafenidin, butafenacil, sulfosate,glufosinate, and a protox inhibitor to control the weeds withoutsignificantly affecting the crop.

EXAMPLES

The following examples are illustrative and not limiting. One of skillwill recognize a variety of non-critical parameters that can be alteredto achieve essentially similar results.

Example 1 Isolating Novel Native GAT Polynucleotides

Five native GAT polynucleotides (i.e., GAT polynucleotides that occurnaturally in a non-genetically modified organism) were discovered byexpression cloning of sequences from Bacillus strains exhibiting GATactivity. Their nucleotide sequences were determined and are providedherein as SEQ ID NO: 1-5. Briefly, a collection of approximately 500Bacillus and Pseudomonas strains were screened for native ability toN-acetylate glyphosate. Strains were grown in LB overnight, harvested bycentrifugation, permeabilized in dilute toluene, and then washed andresuspended in a reaction mix containing buffer, 5 mM glyphosate, and200 μM acetyl-CoA. The cells were incubated in the reaction mix forbetween 1 and 48 hours, at which time an equal volume of methanol wasadded to the reaction. The cells were then pelleted by centrifugationand the supernatant was filtered before analysis by parent ion mode massspectrometry. The product of the reaction was positively identified asN-acetylglyphosate by comparing the mass spectrometry profile of thereaction mix to an N-acetylglyphosate standard as shown in FIG. 2.Product detection was dependent on inclusion of both substrates (acetylCoA and glyphosate) and was abolished by heat denaturing the bacterialcells.

Individual GAT polynucleotides were then cloned from the identifiedstrains by functional screening. Genomic DNA was prepared and partiallydigested with Sau3A1 enzyme. Fragments of approximately 4 Kb were clonedinto an E. coli expression vector and transformed into electrocompetentE. coli. Individual clones exhibiting GAT activity were identified bymass spectrometry following a reaction as described previously exceptthat the toluene wash was replaced by permeabilization with PMBS.Genomic fragments were sequenced and the putative GATpolypeptide-encoding open reading frame identified. Identity of the GATgene was confirmed by expression of the open reading frame in E. coliand detection of high levels of N-acetylglyphosate produced fromreaction mixtures.

Example 2 Characterization of a GAT Polypeptide Isolated from B.Licheniformis Strain B6

Genomic DNA from B. licheniformis strain B6 was purified, partiallydigested with Sau3A1 and fragments of 1-10 Kb were cloned into an E.coli expression vector. A clone with a 2.5 kb insert conferred theglyphosate-N-acetyltransferase (GAT) activity on the E. coli host asdetermined with mass spectrometry analysis. Sequencing of the insertrevealed a single complete open reading frame of 441 base pairs.Subsequent cloning of this open reading frame confirmed that it encodedthe GAT enzyme. A plasmid, pMAXY2120, is shown in FIG. 4. The geneencoding the GAT enzyme of B6 was transformed into E. coli strain XL1Blue. A 10% innoculum of a saturated culture was added to Luria broth,and the culture was incubated at 37° C. for 1 hr. Expression of GAT wasinduced by the addition of IPTG at a concentration of 1 mM. The culturewas incubated a further 4 hrs, following which, cells were harvested bycentrifugation and the cell pellet stored at −80° C.

Lysis of the cells was effected by the addition of 1 ml of the followingbuffer to 0.2 g of cells: 25 mM HEPES, pH 7.3, 100 mM KCl and 10%methanol (HKM) plus 0.1 mM EDTA, 1 mM DTT, 1 mg/ml chicken egg lysozyme,and a protease inhibitor cocktail obtained from Sigma and used accordingto the manufacturer's recommendations. After 20 minutes incubation atroom temperature (e.g., 22-25° C.), lysis was completed with briefsonication. The lysate was centrifuged and the supernatant was desaltedby passage through Sephadex G25 equilibrated with HKM. Partialpurification was obtained by affinity chromatography on CoA Agarose(Sigma). The column was equilibrated with HKM and the clarified extractwas allowed to pass through under hydrostatic pressure. Non-bindingproteins were removed by washing the column with HKM, and GAT was elutedwith HKM containing 1 mM Coenzyme A. This procedure provided 4-foldpurification. At this stage, approximately 65% of the protein stainingobserved on an SDS polyacrylamide gel loaded with crude lysate was dueto GAT, with another 20% due to chloramphenicol acetyltransferaseencoded by the vector.

Purification to homogeneity was obtained by gel filtration of thepartially purified protein through Superdex 75 (Pharmacia). The mobilephase was HKM, in which GAT activity eluted at a volume corresponding toa molecular radius of 17 kD. This material was homogeneous as judged byCoomassie staining of a 3 μg sample of GAT subjected to SDSpolyacrylamide gel electrophoresis on a 12% acrylamide gel, 1 mmthickness. Purification was achieved with a 6-fold increase in specificactivity.

The apparent K_(M) for glyphosate was determined on reaction mixturescontaining saturating (200 μM) Acetyl CoA, varying concentrations ofglyphosate, and 1 μM purified GAT in buffer containing 5 mM morpholineadjusted to pH 7.7 with acetic acid and 20% ethylene glycol. Initialreaction rates were determined by continuous monitoring of thehydrolysis of the thioester bond of Acetyl CoA at 235 nm (E=3.4OD/mM/cm). Hyperbolic saturation kinetics were observed (FIG. 5), fromwhich an apparent K_(M) of 2.9±0.2 (SD) mM was obtained.

The apparent K_(M) for Acetyl CoA was determined on reaction mixturescontaining 5 mM glyphosate, varying concentrations of Acetyl CoA, and0.19 μM GAT in buffer containing 5 mM morpholine adjusted to pH 7.7 withacetic acid and 50% methanol. Initial reaction rates were determinedusing mass spectrometric detection of N-acetyl glyphosate. Five μl wererepeatedly injected into the instrument and reaction rates were obtainedby plotting reaction time vs area of the integrated peak (FIG. 6).Hyperbolic saturation kinetics were observed (FIG. 7), from which anapparent K_(M) of 2 μM was derived. From values for V_(max) obtained ata known concentration of enzyme, a k_(cat) of 6/min was calculated.

Example 3 Mass Spectrometry (MS) Screening Process

Sample (5 μl) was drawn from a 96-well microtiter plate at a speed ofone sample every 26 seconds and injected into the mass spectrometer(Micromass Quattro LC, triple quadrupole mass spectrometer) without anyseparation. The sample was carried into the mass spectrometer by amobile phase of water/methanol (50:50) at a flow rate of 500 Ul/min.Each injected sample was ionized by a negative electrospray ionizationprocess (needle voltage, −3.5 KV; cone voltage, 20 V; sourcetemperature, 120° C.; desolvation temperature, 250° C.; cone gas flow,90 L/Hr; and desolvation gas flow, 600 L/Hr). The molecular ions (m/z210) formed during this process were selected by the first quadrupolefor performing collision induced dissociation (CID) in the secondquadrupole, where the pressure was set at 5×10⁻⁴ mBar and the collisionenergy was adjusted to 20 Ev. The third quadrupole was set for onlyallowing one of the daughter ions (m/z 124) produced from the parentions (m/z 210) to get into the detector for signal recording. The firstand third quadrupoles were set at unit resolution, while thephotomultiplier was operated at 650 V. Pure N-acetylglyphosate standardswere used for comparison and peak integration was used to estimateconcentrations. It was possible to detect less than 200 NmN-acetylglyphosate by this method.

Example 4 Detection of Native or Low Activity GAT Enzymes

Native or low activity GAT enzymes typically have a k_(cat) ofapproximately 1 min⁻¹ and a K_(M) for glyphosate of 1.5-10 Mm. K_(M) foracetyl CoA was typically less than 25 μM.

Bacterial cultures were grown in rich medium in deep 96-well plates and0.5 ml stationary phase cells were harvested by centrifugation, washedwith 5 mM morpholine acetate pH 8, and resuspended in 0.1 ml reactionmix containing 200 μM ammonium acetyl CoA, 5 mM ammonium glyphosate, and5 μg/ml PMBS (Sigma) in 5 mM morpholine acetate, pH 8. The PMBSpermeabilizes the cell membrane allowing the substrates and products tomove from the cells to the buffer without releasing the entire cellularcontents. Reactions were carried out at 25-37° C. for 1-48 hours. Thereactions were quenched with an equal volume of 100% ethanol and theentire mixture was filtered on a 0.45 μm MAHV Multiscreen filter plate(Millipore). Samples were analyzed using a mass spectrometer asdescribed above and compared to synthetic N-acetylglyphosate standards.

Example 5 Detection of High Activity GAT Enzymes

High activity GAT enzymes typically have a k_(cat) up to 400 min⁻¹ and aK_(M) below 0.1 mM glyphosate.

Genes coding for GAT enzymes were cloned into E. coli expression vectorpQE80 (Qiagen) and introduced into E. coli strain XL1 Blue (Stratagene).Cultures were grown in 150 ul rich medium (LB with 50 ug/mlcarbenicllin) in shallow U-bottom 96-well polystyrene plates to late-logphase and diluted 1:9 with fresh medium containing 1 mM IPTG (USB).After 4-8 hours induction, cells were harvested, washed with 5 mMmorpholine acetate pH 6.8 and resuspended in an equal volume of the samemorpholine buffer. Reactions were carried out with up to 10 ul of washedcells. At higher activity levels, the cells were first diluted up to1:200 and 5 ul was added to 100 ul reaction mix. To measure GATactivity, the same reaction mix as described for low activity was used.However, for detecting highly active GAT enzymes the glyphosateconcentration was reduced to 0.15-0.5 mM, the pH was reduced to 6.8, andreactions were carried out for 1 hour at 37° C. Reaction workup and MSdetection were as described herein.

Example 6 Purification of GAT Enzymes

Enzyme purification was achieved by affinity chromatography of celllysates on CoA-agarose and gel-filtration on Superdex-75. Quantities ofpurified GAT enzyme up to 10 mg were obtained as follows: A 100-mlculture of E. coli carrying a GAT polynucleotide on a pQE80 vector andgrown overnight in LB containing 50 ug/ml carbenicillin was used toinoculate 1 L of LB plus 50 ug/ml carbenicillin. After 1 hr, IPTG wasadded to 1 mM, and the culture was grown a further 6 hr. Cells wereharvested by centrifugation. Lysis was effected by suspending the cellsin 25 mM HEPES (pH 7.2), 100 mM KCl, 10% methanol (HKM), 0.1 mM EDTA, 1mM DTT, protease inhibitor cocktail supplied by Sigma-Aldrich and 1mg/ml of chicken egg lysozyme. After 30 minutes at room temperature, thecells were briefly sonicated. Particulate material was removed bycentrifugation, and the lysate was passed through a bed of coenzymeA-Agarose. The column was washed with several bed volumes of HKM and GATwas eluted in 1.5 bed volumes of HKM containing 1 mM acetyl CoA. GAT inthe eluate was concentrated by its retention above a Centricon YM 50ultrafiltration membrane. Further purification was obtained by passingthe protein through a Superdex 75 column through a series of 0.6-mlinjections. The peak of GAT activity eluted at a volume corresponding toa molecular weight of 17 kD. This method resulted in purification of GATenzyme to homogeneity with >85% recovery. A similar procedure was usedto obtain 0.1 to 0.4 mg quantities of up to 96 shuffled variants at atime. The volume of induced culture was reduced to 1 to 10 ml, coenzymeA-Agarose affinity chromatography was performed in 0.15-ml columnspacked in an MAHV filter plate (Millipore) and Superdex 75chromatography was omitted.

Example 7 Standard Protocol for Determination of K_(CAT) and K_(M)

k_(cat) and K_(M) for glyphosate of purified protein were determinedusing a continuous spectrophotometric assay, in which hydrolysis of thesulfoester bond of Acetyl CoA was monitored at 235 nm. Reactions wereperformed at ambient temperature (about 23° C.) in the wells of a96-well assay plate, with the following components present in a finalvolume of 0.3 ml: 20 mM HEPES, pH 6.8, 10% ethylene glycol, 0.2 mMacetyl CoA, and various concentrations of ammonium glyphosate. Incomparing the kinetics of two GAT enzymes, both enzymes were assayedunder the same conditions, e.g., both at 23° C. k_(cat) was calculatedfrom V_(max) and the enzyme concentration, determined by Bradford assay.K_(M) was calculated from the initial reaction rates obtained fromconcentrations of glyphosate ranging from 0.125 to 10 mM, using theLineweaver-Burke transformation of the Michaelis-Menten equation.k_(cat)/K_(M) was determined by dividing the value determined fork_(cat) by the value determined for K_(M).

Using this methodology, kinetic parameters for a number of GATpolypeptides exemplified herein were determined. For example, thek_(cat), K_(M) and k_(cat)/K_(M) for the GAT polypeptide correspondingto SEQ ID NO:445 have been determined to be 322 min⁻¹, 0.5 mM and 660mM⁻¹ min⁻¹, respectively, using the assay conditions described above.The k_(cat), K_(M) and k_(cat)/K_(M) for the GAT polypeptidecorresponding to SEQ ID NO:457 have been determined to be 118 min⁻¹, 0.1mM and 1184 mM⁻¹ min⁻¹, respectively, using the assay conditionsdescribed above. The k_(cat), K_(M) and k_(cat)/K_(M) for the GATpolypeptide corresponding to SEQ ID NO:300 have been determined to be296 min⁻¹, 0.65 mM and 456 mM⁻¹ min⁻¹, respectively, using the assayconditions described above. One of skill in the art can use thesenumbers to confirm that a GAT activity assay is generating kineticparameters for a GAT suitable for comparison with the values givenherein. For example, the conditions used to compare the activity of GATsshould yield the same kinetic constants for SEQ ID NO: 300, 445, and 457(within normal experimental variance) as those reported herein, when theconditions are used to compare a test GAT with the GAT polypeptidesexemplified herein.

K_(M) for Acetyl CoA was measured using the mass spectrometry methodwith repeated sampling during the reaction. Acetyl CoA and glyphosate(ammonium salts) were placed as 50-fold-concentrated stock solutionsinto a well of a mass spectrometry sample plate. Reactions wereinitiated with the addition of enzyme appropriately diluted in avolatile buffer such as morpholine acetate or ammonium carbonate, pH 6.8or 7.7. The sample was repeatedly injected into the instrument andinitial rates were calculated from plots of retention time and peakarea. K_(M) was calculated as for glyphosate.

Example 8 Selection of Transformed E. Coli

An evolved GAT gene (a chimera with a native B. licheniformis ribosomebinding site (AACTGAAGGAGGAATCTC; SEQ ID NO:515) attached directly tothe 5′ end of the GAT coding sequence) was cloned into the expressionvector pQE80 (Qiagen) between the EcoRI and HindIII sites, resulting inthe plasmid pMAXY2190 (FIG. 11). This eliminated the His tag domain fromthe plasmid and retained the B-lactamase gene conferring resistance tothe antibiotics ampicillin and carbenicillin. pMAXY2190 waselectroporated (BioRad Gene Pulser) into XL1 Blue (Stratagene) E. colicells. The cells were suspended in SOC rich medium and allowed torecover for one hour. The cells were then gently pelleted, washed onetime with M9 minimal media lacking aromatic amino acids (12.8 g/LNa2HPO_(4.7) H2O, 3.0 g/L KH2PO4, 0.5 g/L NaCl, 1.0 g/L NH4Cl, 0.4%glucose, 2 mM MgSO4, 0.1 mM CaCl2, 10 mg/L thiamine, 10 mg/L proline, 30mg/L carbenicillin), and resuspended in 20 ml of the same M9 medium.After overnight growth at 37° C. at 250 rpm, equal volumes of cells wereplated on either M9 medium or M9 plus 1 mM glyphosate medium. pQE80vector with no GAT gene was similarly introduced into E. coli cells andplated for single colonies for comparison. Table 3 presents a summary ofthe results, demonstrating that GAT activity allows selection and growthof transformed E. coli cells with less than 1% background. Note that noIPTG induction was necessary for sufficient GAT activity to allow growthof transformed cells. Transformation was verified by re-isolation ofpMAXY2190 from the E. coli cells grown in the presence of glyphosate.

TABLE 3 Glyphosate selection of pMAXY2190 in E. coli Number of coloniesPlasmid M9 − glyphosate M9 + 1 mM glyphosate pMAXY2190 568 512 pQE80 3243

Example 9 Selection of Transformed Plant Cells

Agrobacterium-mediated transformation of plant cells occurs at lowefficiencies. To allow propagation of transformed cells while inhibitingproliferation of non-transformed cells, a selectable marker is needed.Antibiotic markers for kanamycin and hygromycin and the herbicidemodifying gene bar, which detoxifies the herbicidal compoundphosphinothricin, are examples of selectable markers used in plants(Methods in Molecular Biology, 1995, 49:9-18). Here we demonstrate thatGAT activity serves as an efficient selectable marker for planttransformation. An evolved GAT gene (0_(—)5B8), SEQ ID NO: 190, wascloned between a plant promoter (enhanced strawberry vein banded virus)and a ubiquinone terminator and introduced into the T-DNA region of thebinary vector pMAXY3793 suitable for transformation of plant cells viaAgrobacterium tumefaciens EHA105 as shown in FIG. 12. A screenable GUSmarker was present in the T-DNA to allow confirmation of transformation.Transgenic tobacco shoots were generated using glyphosate as the onlyselecting agent.

Auxillary buds of Nicotiana tabacum L. Xanthi were subcultured onhalf-strength MS medium with sucrose (1.5%) and Gelrite (0.3%) under16-h light (35-42 μEinsteins m⁻² s⁻¹, cool white fluorescent lamps) at24° C. every 2-3 weeks. Young leaves were excised from plants after 2-3weeks subculture and were cut into 3×3 mm segments. A. tumefaciensEHA105 was inoculated into LB medium and grown overnight to a density ofA600=1.0. Cells were pelleted at 4,000 rpm for 5 minutes and resuspendedin 3 volumes of liquid co-cultivation medium composed of Murashige andSkoog (MS) medium (pH 5.2) with 2 mg/L N6-benzyladenine (BA), 1% glucoseand 400 uM acetysyringone. The leaf pieces were then fully submerged in20 ml of A. tumefaciens in 100×25 mm Petri dishes for 30 min, blottedwith autoclaved filter paper, then placed on solid co-cultivation medium(0.3% Gelrite) and incubated as described above. After 3 days ofco-cultivation, 20-30 segments were transferred to basal shoot induction(BSI) medium composed of MS solid medium (pH 5.7) with 2 mg/L BA, 3%sucrose, 0.3% Gelrite, 0-200 uM glyphosate, and 400 ug/ml Timentin.

After 3 weeks, shoots were clearly evident on the explants placed onmedia with no glyphosate regardless of the presence or absence of theGAT gene. T-DNA transfer from both constructs was confirmed by GUShistochemical staining of leaves from regenerated shoots. Glyphosateconcentrations greater than 20 uM completely inhibited any shootformation from the explants lacking a GAT gene. Explants infected withA. tumefaciens with the GAT construct regenerated shoots at glyphosateconcentrations up to 200 uM (the highest level tested). Transformationwas confirmed by GUS histochemical staining and by PCR fragmentamplification of the GAT gene using primers annealing to the promoterand 3′ regions. The results are summarized in Table 4.

TABLE 4 Tobacco shoot regeneration with glyphosate selection. Glyphosateconcentration % Shoot Regeneration Transferred genes 0 uM 20 uM 40 uM 80uM 200 uM GUS 100 0 0 0 0 gat and GUS 100 60 30 5 3

Example 10 Glyphosate Selection of Transformed Yeast Cells

Selection markers for yeast transformation are usually auxotrophic genesthat allow growth of transformed cells on a medium lacking the specificamino acid or nucleotide. Because Saccharomyces cerevisiae is sensitiveto glyphosate, GAT can also be used as a selectable marker. Todemonstrate this, an evolved GAT gene (0_(—)6D10), SEQ ID NO: 196, iscloned from the T-DNA vector pMAXY3793 (as shown in Example 9) as aPstI-ClaI fragment containing the entire coding region and ligated intoPstI-ClaI digested p424TEF (Gene, 1995, 156:119-122) as shown in FIG.13. This plasmid contains an E. coli origin of replication and a geneconferring carbenicillin resistance as well as a TRP1, tryptophanauxotroph selectable marker for yeast transformation.

The GAT containing construct is transformed into E. coli XL1 Blue(Statagene) and plated on LB carbenicillin (50 ug/ml) agar medium.Plasmid DNA is prepared and used to transform yeast strain YPH499(Stratagene) using a transformation kit (Bio101). Equal amounts oftransformed cells are plated on CSM-YNB-glucose medium (Bio101) lackingall aromatic amino acids (tryptophan, tyrosine, and phenylalanine) withadded glyphosate. For comparison, p424TEF lacking the GAT gene is alsointroduced into YPH499 and plated as described. The results demonstratethat GAT activity function will as an efficient selectable marker. Thepresence of the GAT containing vector in glyphosate selected coloniescan be confirmed by re-isolation of the plasmid and restriction digestanalysis.

Example 11 Herbicide Spray Tests of GAT Expressing Tobacco Plants

Tobacco shoots generated as described in EXAMPLE 9 were excised from theexplants and transferred to basal root induction (BRI) medium composedof half-strength Murashige and Skoog (MS) medium, pH 5.7, with 1.5%sucrose, 0.3% Gelrite, 0-200 uM glyphosate and 400 ug/ml Timentin.Rooted plants and axillary shoots were clonally propagated by cuttingthe stem and transferring it to fresh BRI medium until the desirednumber of clones was obtained. Rooted plants were carefully removed fromthe solid medium. Prior to placing the plants into small pots of soil,the roots were washed to remove any remaining Gelrite. A protectiveplastic cover was kept over the plants for at least one week until theplants were well established.

To determine if GAT expressing tobacco plants could tolerate simulatedfield rate sprays of glyphosate, clonal lines of several events per GATvariant were tested. A typical test was set up as follows: One clonefrom each event was sprayed with 1 ml of solution containing theisopropylamine salt of glyphosate (Sigma P5671) and 0.125% Triton X-100,pH 6.8 such that the amount of active ingredient sprayed was equivalentto that present in commercial glyphosate products. For example, toachieve 32 oz/acre (1×) of herbicide containing 40% active ingredient(“ai”), 2.4 ul of 40% ai formulation was diluted into 1 ml water andsprayed on a plant in a 4-inch square pot (16 in²). A mock application(0×) with surfactant only was also included. In some cases a secondspray was applied 1-4 weeks later. Plants were kept in controlled growthrooms at 25° C. and 70% humidity with 16 hr light.

In this example, 10 events confirmed positive for GAT0_(—)6D 10 (SEQ IDNO:196), ten for GAT0_(—)5D3 (SEQ ID NO:193), 8 events for GAT0_(—)5B8(SEQ ID NO:190), and plants transformed with the vector only (no GAT)were clonally propagated, transferred to soil and sprayed when plantshad an average of 5 leaves. Seed-grown wild type plants were alsosprayed. After two weeks, the vector only and seed grown plants sprayedwith 0.5, 2 or 4× glyphosate stopped growing, wilted, and turned brown.Each of the transgenic GAT plants survived the spraying procedurewithout signs of glyphosate damage such as chlorosis, leaf elongation,stunting, or browning. All OX plants were healthy, including the non-GATcontrol plants. Three weeks later all of the surviving plants weresprayed with an 8× dose. The OX control plants died within two weeks.Again, all GAT plants survived.

Tobacco plants transformed with GAT and selected on glyphosate werefertile. Flowering and seed set were not detectably different from wildtype plants.

Example 12 Mendelian Inheritance of GAT Gene and Glyphosate-TolerantPhenotype

Mendelian inheritance of the GAT gene and glyphosate-tolerant phenotypewas demonstrated with transformed Arabidopsis. Columbia type Arabidopsisplants were grown and transformed by the dipping method (Clough, S J andBent, AF, (1998) Plant J. 16(6):735-43) with a construct containing theGAT variant called chimera (SEQ ID NO: 16). Bulk seed was collected andGAT plants were confirmed by PCR with primers specific to the insertwithin the T-DNA. TI seed from individual events were sown on soil with10-30 seeds per 2-inch square pot. When the first set of true leaves wasemerging, pots were sprayed with glyphosate equivalent to 0.5 and IXcommercial product (as calculated in EXAMPLE 11). After two weeks,segregation of the transgene and tolerant phenotype was evident as shownin Table 5.

TABLE 5 Summary of segregation data for 0.5 and 1X glyphosate-tolerantT1 Arabidopsis Chimera event (SEQ ID NO: 16) #Survivors #DeadSegregation ratio 1 8 11 1:1.4 3 6 22 1:3.7 5 26 2 13:1   13  10 9 1:1  65  46 19 2.4:1   Vector only 0 22 — Wild-type 0 29 —

Ratios near 3:1 indicate a single segregating dominant event. Ratiosgreater than 3:1 indicate several segregating inserts. Ratios less than3:1 can be due to small sample size effects, incomplete dominance, orposition effects that render expression too low to confer herbicidetolerance. Compared to the controls, it was clear that the GAT gene wastransmitted to the T I generation and conferred glyphosate tolerance.

Example 13 Production of Glyphosate-Resistant Maize Expressing GATTransgenes

Maize plants expressing GAT variant transgenes were produced using themethods described in U.S. Pat. No. 5,981,849, which is incorporatedherein by reference. Specifically, Agrobacterium tumefaciens vectorswere constructed according to methods known in the art. Each vectorcontained an insert having an ubiquitin promoter and intron, a GATvariant and a PinII terminator. Maize immature embryos were excised andinfected with an Agrobacterium tumefaciens vector containing the GATvariant of interest. After infection, embryos were transferred andcultured in co-cultivation medium. After co-cultivation, the infectedimmature embryos were transferred onto media containing 1.0 mMglyphosate (Roundup ULTRA MAX™). This selection lasted until activelygrowing putative transgenic calli were identified. The putativetransgenic callus tissues were sampled for PCR and Western assay (datanot shown) to confirm the presence of the GAT gene. The putativetransgenic callus tissues were maintained on 1.0 mM glyphosate selectionmedia for further growth and selection before plant regeneration. Atregeneration, callus tissue confirmed to be transgenic were transferredonto maturation medium containing 0.1 mM glyphosate and cultured forsomatic embryo maturation. Mature embryos were then transferred ontoregeneration medium containing 0.1 mM glyphosate for shoot and rootformation. After shoots and roots emerged, individual plantlets weretransferred into tubes with rooting medium containing 0.1 mM glyphosate.Plantlets with established shoots and roots were transplanted into potsin the greenhouse for further growth, the generation of T0 spray dataand the production of TI seed.

In order to evaluate the level of glyphosate resistance of thetransgenic maize plants expressing the GAT variant transgenes, T0 plantswere sprayed with glyphosate (Roundup ULTRA MAX™) in the greenhouse.Plant resistance levels were evaluated by plant discoloration scores andplant height measurements. Plant discoloration and plant height wereevaluated according to the following scales:

Discoloration score at 1, 2, 3 and 4 weeks after spray with glyphosate 9= no leaf/stem discoloration 7 = minor leaf/stem discoloration 5 = worseleaf/stem discoloration 3 = severely discolored plant or dying plant 1 =dead plant Plant height measurements before spraying with glyphosateafter spraying with glyphosate at 1, 2, 3 and 4 weeks mature plants (attasseling)

Two plants were sent to the greenhouse from each event (independenttransgenic callus) listed in Table 6. Plant 1 was kept for seedproduction and was not sprayed with glyphosate. Plant 2 was sprayed at4× glyphosate (1× glyphosate=26 ounces/acre) at 14 days aftertransplanting. The T0 plant discoloration scores with 4× spray at 7 and14 days after the spray are shown in Tables 6 and 7. Height data attasseling is shown in FIG. 14. An additional experiment was performed inwhich T0 plants were sprayed with 6× glyphosate. The T0 plantdiscoloration scores with 6× spray at 10 days after spray are shown inTable 8.

TABLE 6 Resistance Scores at 7 days after treatment with 4x glyphosate #events tested % events constructs with 4x % events @ 9 % events @ 7 @ <718534 169 30% (50) 59% (101) 11% (18) (SEQ ID NO: 196) 18537 72 40% (29)54% (39)  6% (4) (SEQ ID NO: 193) 18540 111 32% (36) 61% (67)  7% (8)(SEQ ID NO: 190) Total 352  33% (115) 59% (207)  8% (30)

TABLE 7 Resistance Scores at 14 days after treatment with 4x glyphosateconstructs # events tested with 4x % events @ 9 18534 169 29% (49) (SEQID NO: 196) 18537 72 50% (36) (SEQ ID NO: 193) 18540 111 29% (32) (SEQID NO: 190) Total 352  33% (117)

TABLE 8 Resistance Scores at 10 days after treatment with 6X glyphosate% events with no damage after constructs # events tested with 6Xglyphosate treatment (score = 9) 19286 312 51% (160) (SEQ ID NO: 814)19288 310 52% (163) (SEQ ID NO: 549) 19900 231 56% (129) (SEQ ID NO:738) 19902 230 42% (96)  (SEQ ID NO: 638) 21895 55 30% (17)  (SEQ ID NO:848) 21896 61 61% (37)  (SEQ ID NO: 912) 21905 32 70% (25)  (SEQ ID NO:906) Total 1231 51% (627)

Example 14 GAT is also an Acyltransferase

The ability of GAT variants (B6 (SEQ ID NO:7), 0_(—)6D10 (SEQ IDNO:448), 17-15H3 (SEQ ID NO:601), and 20-8H12c (SEQ ID NO:817)) totransfer the propionyl group from propionyl CoA to glyphosate was testedin reaction mixtures containing 5 mM glyphosate or no glyphosate.Propionyl CoA was present at 1 mM. After 30 minutes the reactions wereterminated and the presence of free propionyl CoA was determined by theaddition of DTNB. All variants showed glyphosate-dependent hydrolysis ofpropionyl CoA. These results indicate that GAT also functions as anacyltransferase.

Example 15 T1 Studies of Glyphosate-Resistant Maize Expressing GATTransgenes

Maize plants expressing GAT variant transgenes 18-28D9b (SEQ ID NO:814)and 17-15H3 (SEQ ID NO:549) were produced using the methods described inExample 13. TI plants were used for the generation of glyphosate fieldtolerance data. The TI plants were treated in the field with fourdifferent glyphosate spray treatments (OX, 4×, 8×, and 4X+4×) for eachevent. The plants were sprayed at V3 and V8. Plants were scored 10 daysafter treatment for leaf discoloration and plant height comparisons asdescribed in Example 13. The TI field spray data correlated well withthe results previously obtained in the greenhouse as reported in Example13. T2 seeds were collected for further studies.

Example 16 Thermostability of GAT Polypeptides

A. Effect of Temperature Variation on Glyphosate Tolerance of GlyphosateResistant Maize Expressing GAT Transgenes

Maize plants expressing GAT variant transgenes 10_(—)4F2 (SEQ IDNO:203), 17-15H3 (SEQ ID NO:549), and 18-28D9b (SEQ ID NO:814) wereproduced using the methods described in Example 13. The effect oftemperature on glyphosate tolerance was evaluated in TI plants. The TIplants were grown in cool/cold (day 14° C., night 8° C.), warm (day 28°C., night 20° C.), and hot (day 37° C., night 20° C.) conditions. Tiplants were sprayed at V2 with four different glyphosate spraytreatments (0×, 4×, 6×, and 8×). Plants were scored at 5 and 14 daysafter treatment for leaf discoloration and plant height comparisons asdescribed in Example 13. Visual observations indicated that glyphosatetolerance is not adversely effected by the range of temperatures tested.

B. Effect of Temperature Variation on GAT Activity In Vitro

In vitro thermostability of several GAT polypeptides (DS3 (a native GATpolypeptide corresponding to SEQ ID NO: 8), 6_(—)6D5 (SEQ ID NO: 410),17-15H3 (SEQ ID NO: 601), 20-8H12 (SEQ ID NO: 739), 22-13B12 (SEQ ID NO:781) and 401 (a native GAT polypeptide corresponding to SEQ ID NO: 6))was evaluated in accordance with the following method. The enzymes weredistributed to 200 μl strip PCR tubes (VWR, San Francisco, Calif.) andincubated in a gradient thermocycler (ML Research, Watertown, Mass.) for15 minutes at various temperatures between 30° C. and 60° C. asindicated in FIG. 17. Precipitated protein was removed bycentrifugation, and surviving enzymatic activity of the remainingsoluble protein was measured at 22° C. by the continuousspectrophotometric assay, as described in Example 7. Saturatingconcentrations of glyphosate (10 mM for DS3 (SEQ ID NO: 8), 401 (SEQ IDNO: 6) and 6_(—)6D5 (SEQ ID NO: 410); 5 mM for 17-15H3 (SEQ ID NO: 601),20-8H12 (SEQ ID NO: 739), and 22-13B12 (SEQ ID NO: 781) and AcCoA (167μM) were used.

The data is depicted in FIG. 17. Native (i.e., wild type) GATpolypeptides DS3 (SEQ ID NO: 8) and 401 (SEQ ID NO: 6) appeared stablewith respect to activity at temperatures up to about 42 to about 44° C.GAT polypeptides that are not native to any organism (i.e., not wildtype) appeared stable at temperatures in the range of about 47° C. toabout 54° C.

The half lives of several GAT polypeptides were also measured at 37.5°C. according to the following procedure. GAT polypeptides 401 (SEQ IDNO: 6), 17-15H3 (SEQ ID NO: 601), 20-8H12 (SEQ ID NO: 739), 22-13B12(SEQ ID NO: 781), 22-15B4 (SEQ ID NO: 946) and 22-18C5 (SEQ ID NO: 795)were incubated in a matrix of 25 mM Hepes, pH 7.2, 10 mM KCl and 10%methanol (“HKM”). At various timepoints, aliquots were withdrawn andassayed in triplicate at 22° C. using the continuous spectrophotometricassay described in Example 7 using saturating concentrations ofglyphosate (20 mM for 401, 5 mM for the rest) and AcCoA (167 uM). Thestandard error at each time point averaged about 2.9%. GAT activity wasplotted as a function of incubation time and the data was fitted to acurve for exponential decay (y=e^(−x)), where y is enzyme activity and xis time in hours, from which half life was calculated. The data is shownbelow in Table 9.

TABLE 9 Half-lives of GAT polypeptides at 37.5 C Enzyme Half-life, Hrs401 14 (SEQ ID NO: 6) 17-15H3 45 (SEQ ID NO: 601) 20-8H12 54 (SEQ ID NO:739) 22-13B12 67 (SEQ ID NO: 781) 22-15B4 26 (SEQ ID NO: 946) 22-18C5 43(SEQ ID NO: 795)

Example 17 Production of Glyphosate-Resistant Soybean Expressing GATTransgenes

Soybean plants expressing GAT variant transgenes were produced using themethod of particle gun bombardment (see Klein et al. (1987) Nature327:70-73) using a DuPont Biolistic PDS1000/He instrument. The selectionagent used during the transformation process was hygromycin. Either thehygromycin selectable marker gene remained in the transgenic events orthe hygromycin gene was excised by methods known in the art. DNAfragments were prepared with a synthetic constitutive promoter, a GATvariant and PinII terminator. The selectable marker gene, comprising the35S CaMV promoter, HPT gene and NOS terminator, was cobombarded with theGAT gene variant as described above. Bombarded soybean embryogenicsuspension tissue was cultured for one week in the absence of selectionagent. Embryogenic suspension tissue was placed in liquid selectionmedium for 6 weeks. Putative transgenic suspension tissue was sampledfor PCR analysis to determine the presence of the GAT gene. Putativetransgenic suspension culture tissue was maintained in selection mediumfor 3 weeks to obtain enough tissue for plant regeneration. Suspensiontissue was matured for 4 weeks using standard procedures; maturedsomatic embryos were desiccated for 4-7 days and then placed ongermination induction medium for 2-4 weeks. Germinated plantlets weretransferred to soil in cell pack trays for 3 weeks for acclimatization.Plantlets were potted to 10-inch pots in the greenhouse for evaluationof glyphosate resistance.

To determine the level of glyphosate resistance of transgenic soybeansexpressing the GAT variant transgenes, T0 plants were sprayed withglyphosate (Roundup ULTRA MAX™) in the greenhouse. Plant resistancelevels were evaluated by plant discoloration scores and plant heightmeasurements.

Discoloration score at 2 weeks after spray with glyphosate 9 = noleaf/stem discoloration 7 = minor leaf/stem discoloration 5 = worseleaf/stem discoloration 3 = severely discolored plant or dying plant 1 =dead plant

One to four plants were sent to the greenhouse from each independenttransgenic event. An additional 1-2 plants per event were grown incontrolled environment growth chambers for seed production and were notsprayed with glyphosate. The greenhouse plants were sprayed at 1×, 2× or4× glyphosate (1× glyphosate=26 ounces/acre of RoundUp ULTRA MAX™) 3-4weeks after transfer to soil. The T0 plant discoloration scores with 2×and 4× spray rates are shown in Table 10 and Table 11, respectively.

These results show that soybeans are effectively transformed with GATgene variants as confirmed by PCR analysis. Transgenic soybeansexpressing GAT gene variants are resistant to glyphosate at 2× and 4×spray rates. Events surviving the 4× glyphosate spray rate do show someminor leaf discoloration however within 2 weeks of the spray test,plants recover and demonstrate normal leaf morphology.

TABLE 10 Resistance Scores at 10 days after treatment with 2Xglyphosate. # EVENTS % EVENTS # EVENTS TESTED WITH 2X @ 7-8 @ 3-6 SEQ IDNO: 193 27 15% (4)  11% (3)  SEQ ID NO: 824 38 8% (3) 74% (23)

TABLE 11 Resistance Scores at 10 days after treatment with 4Xglyphosate. # EVENTS % EVENTS # EVENTS TESTED WITH 2X @ 7-8 @ 3-6 SEQ IDNO: 824 23 8% (2) 43% (10)

Example 18 Effect of Salt on GAT Kinetics

To better approximate the physiological conditions under which the GATenzymes of the invention are intended to be used (e.g., plant cells),the activities of some GAT enzymes of the invention were re-evaluated inthe presence of added salt. FIGS. 15A and 15B provide a comparison ofthe kinetic parameters K_(m) and k_(cat)/K_(m), respectively, for nativeGAT enzymes GAT401 (SEQ ID NO:6), B6 (SEQ ID NO:7), and DS3 (SEQ IDNO:8), and evolved GAT enzymes 0_(—)6D10 (SEQ ID NO:448), 10_(—)4F2 (SEQID NO:454), 18-28D9 (SEQ ID NO:618), 17-15H3 (SEQ ID NO:601), 17-10B3(SEQ ID NO:592), 20-8H12 (SEQ ID NO:739), 20-16A3 (SEQ ID NO:639), and20-30C6 (SEQ ID NO:683), assayed in either the absence of added KCl(unshaded bars) or in the presence of 20 mM KCl (shaded bars). Proteinconcentrations were determined using the Bradford assay as described inExample 7. Owing to their extremely low Kms for glyphosate in theabsence of KCl, the kinetic parameters for evolved GAT enzymes0_(—)6D10, 18-28D9 and 20-8H12 were determined in the absence of KClusing the mass spectrometry assay as described in Example 3, while allother kinetic parameters (either in the absence or presence of KCl) weredetermined using the continuous spectrophotometric assay as described inExample 7. Error bars represent the standard deviation of multipleassays, where available. FIG. 15A shows that addition of salt (20 mMKCl) to the assay buffer significantly increases the K_(m) value forglyphosate. The k_(cat) value remains relatively unchanged or increasesslightly, the net result being a lower observed k_(cat)/K_(m) value forGAT enzymes assayed in the presence of 20 mM KCl than in the absence ofadded KCl (FIG. 15B).

Example 19 Further Evolved GAT Genes Encoding GAT Enzymes with ExtremelyHigh Activity

Additional iterations of directed molecular evolution yielded furtherevolved gat genes encoding GAT enzymes exhibiting extremely high GATactivity, e.g. exhibiting one or more improved property such as reducedK_(m) for glyphosate, increased k_(cat), or increased k_(cat)/K_(m)compared to previously-described GAT enzymes.

The further evolved gat genes were first selected for growth in E. coliin minimal M9 medium as described in Example 8, except that 5 mM ratherthan 1 mM glyphosate was used in the selection. Proteins were purifiedas described in Example 6 above.

Protein concentrations were determined by UV absorbance at 205 nm. Theextinction coefficient was determined by the method described by Scopes(1994; Protein Purification, Principles and Practice, Springer, NewYork) according to the formula E (mg ml⁻¹ cm⁻¹)=27+120(A₂₈₀/A₂₀₅)=30.5.Prior to quantitation by UV absorbance the protein solution wasbuffer-exchanged into 50 mM Na₂SO₄ using a NAP-5 column(Amersham-Pharmacia Biotech).

Exemplary further evolved gat coding sequences comprise nucleic acidssequences identified herein as SEQ ID NOs: 832, 834, 836, 838, 840, 842,844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 864, 866, 868, 870,872, 874, 876, 878, 880, 882, 884, 886, 888, 890, 892, 894, 896, 898,900, 902, 904, 906, 908, 910, 912, 914, 916, 918, 920, 922, 924, 926,928, and 930, which encode further evolved GAT enzymes comprising aminoacid sequences identified herein as SEQ ID NOs: 833, 835, 837, 839, 841,843, 845, 847, 849, 851, 853, 855, 857, 859, 861, 863, 865, 867, 869,871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891,893,895,897,899,901,903,905,907, 909, 911, 913, 915, 917, 919, 921, 923, 925, 927, 929,and 931, respectively. Some such further evolved GAT enzymes exhibitextremely high GAT activity, in that they exhibit one or more improvedproperty such as reduced K_(m) for glyphosate, increased k_(cat), orincreased k_(cat)/K_(m), compared to previously-described GAT enzymesassayed under the same conditions.

FIGS. 16A, 16B and 16C provide a comparison of the kinetic parametersK_(m), k_(cat), and k_(cat)/K_(m), respectively, of severalpreviously-described GAT enzymes (unshaded bars) to the kineticparameters of some further evolved GAT enzymes of the invention (shadedbars), assayed using the continuous spectrophotometric assay in thepresence of 20 mM KCl with protein quantified via UV absorbance asdescribed above. Error bars represent the standard deviation of multipleassays, where available. Under these assay conditions, native GAT enzymeGAT401 (SEQ ID NO:6) exhibited a K_(m) for glyphosate of about 4 mM, ak_(cat) of about 5.4 min⁻¹, and a k_(cat)/K_(m) of about 1.35 mM¹ min⁻¹.When assayed under these conditions, some further evolved GAT enzymes ofthe invention (shaded bars) exhibit a range of K_(m) values forglyphosate of less than about 0.4 mM (such as, between about 0.4 mM and0.1 mM), k_(cat) values of at least about 1000 min⁻¹ (such as, betweenabout 1000 min⁻¹ and about 2500 min⁻¹), and k_(cat)/K_(m) values of atleast about 4800 mM⁻¹ min⁻¹ (such as, between about 4800 mM⁻¹ min⁻¹ andabout 8000 mM⁻¹ min⁻¹). For example, some further evolved GAT enzymes ofthe invention exhibit at least about a 7000-fold increase ink_(cat)/K_(m) over native GAT enzyme GAT401 under these assayconditions.

Some further evolved GAT enzymes of the invention comprise one or moreamino acid residue positions not observed in previously described GATpolypeptides and GAT enzymes, such as, at position 27, a B1, Z1 or Aamino acid residue; at position 33, an N or G amino acid residue; atposition 46, a B2, Z4, or H amino acid residue; and at position 93, an Ramino acid residue; where B1 is an amino acid selected from the groupconsisting of A, I, L, M, F, W, Y and V; B2 is an amino acid selectedfrom the group consisting of R, N, D, C, Q, E, G, H, K, P, S, and T; Z1is an amino acid selected from the group consisting of A, I, L, M and V;and Z4 is an amino acid selected from the group consisting of R, H andK. For example, some further evolved GAT enzymes of the inventioncomprise one or more of: an Ala at position 27 (i.e., Ala27); an Asn ora Gly at position 33 (i.e., Asn33 or Gly33); a His at position 46 (i.e.,His46); and an Arg at position 93 (i.e., Arg93), with sequence numberingcorresponding to that of, e.g., SEQ ID NO: 907.

Sequence/activity analyses were performed to identify amino acidresidues which correlate positively with a high k_(cat)/K_(m) (asmanifested by a high k_(cat), a low K_(m), or both). Amino acid residueswhich appear to correlate positively with a high k_(cat)/K_(m) includeGlu14, Asp32, Asn33, Gly38, and Thr62 (sequence numbering correspondingto that of SEQ ID NO:907). Additional GAT enzymes may be constructed bysubstituting codons for one or more of these residues into theappropriate position(s) of a coding sequence of a template GATpolypeptide. For example, additional GAT enzymes were generated bysubstituting one or more of codons encoding Glu at codon position 14,Asp at position 32, Asn at position 33, Gly at position 38, and Thr atposition 62, into a nucleic acid sequence encoding a templatepolypeptide, such as GAT 24-5H5 (SEQ ID NO:845) or GAT 25-8H7 (SEQ IDNO:907), two of the further evolved GAT enzymes exhibiting extremelyhigh activity as described above. Exemplary further evolved GAT enzymesgenerated in this manner, identified herein as R12G1 (SEQ ID NOs917),R12G2 (SEQ ID NO:919), R12G3 (SEQ ID NO:921), R12G4 (SEQ ID NO:923),R12G5 (SEQ ID NO:925), R12G6 (SEQ ID NO:927), R12G7 (SEQ ID NO:929), andR12G8 (SEQ ID NO:931), encoded by nucleic acids identified as SEQ IDNOs: 916, 918, 920, 922, 924, 926, 928, and 930, respectively, exhibitedextremely high GAT activities comparable to those of the templatepolypeptides.

Example 20 Amino Acids that Correlate with High GAT Activity

The amino acids aspartic acid (Asp, D), histidine (His, H) and cysteine(Cys, C) are known to be associated with the active sites of variousacetyltransferase enzymes. To determine if any such residues play a rolein GAT activity, all D, C, and H residues of GAT20-30C6 (SEQ ID NO:683)were individually mutated to alanine (Ala, A) and the mutated enzymesassayed for N-acetylglyphosate activity. Variants containing thesubstitutions D34A and H41A retained only about 2%-3% of the activity ofthe unmodified enzyme, while the variant containing the substitutionH138A exhibited essentially no measurable GAT activity. On the otherhand, variants containing the substitutions H138R and H138S retained lowbut measurable GAT activity (particularly at pHs greater than 6.8),suggesting that His (and nominally Arg and Ser) at position 138 mayserve as an active-site base.

TABLE 12 0 KCl Bradford protein assay k_(cat) K_(m) k_(cat)/K_(m) %k_(cat)/K_(m) Enzyme (min⁻¹) (mM) (mM⁻¹ min⁻¹) of 20-30C6 20-30C6 3860.182 2122 100 (SEQ ID NO: 683) 20-30C6 H41A 208 4.80 43 2.0 20-30C6D34A 127 2.33 54 2.6 20-30C6 H138A <0.02 nd nd <0.005 20-30C6 H138R 44.317.8 2.49 0.12 20-30C6 H138S 5.35 7.1 0.75 0.03

Example 21 Improving GAT Expression in Plants

Plants, animals, and microbes are known to have specific codonpreferences that affect the efficiency of amino acid incorporationduring translation of gene transcripts. Rare codons could cause problemswith tRNA recruitment during translation, which could then lead to loweraccumulation of the encoded protein. The original parental gat geneswere from bacteria such as Bacillus licheniformis, and, as such, may nothave an optimal codon distribution for expression in plants. Evolved gatgenes of the invention have successfully been expressed in plants (see,e.g., Examples 9, 11, 13, and 17, above), yet an opportunity exists toimprove protein production by increasing the translation efficiency inplants. One way to accomplish this is by substituting one or more codonsin the gat coding sequence which are used infrequently in plants forcodons for the same amino acid(s) which are more frequently used inplants, thereby generating silent mutations in the gat coding sequencewith an unchanged sequence of the encoded protein.

Tables showing the frequency of codon usage in corn, cotton and soybeans(available, for example, from the website maintained by the Kazusa DNAResearch Institute, Chiba, Japan) were compared to generate thefollowing table (Table 13) showing codons which are, in general, morefrequently or less frequently utilized in either monocot or dicotplants.

TABLE 13 Codons more fre- Codons less fre- quently utilized quentlyutilized Amino acid in plants in plants Alanine Ala A GCA GCC GCT GCGCysteine Cys C TGC TGT Aspartic acid Asp D GAC GAT Glutamic acid Glu EGAA GAG Phenylalanine Phe F TTC TTT Glycine Gly G GGA GGT GGC GGGHistidine His H CAC CAT Isoleucine Ile I ATC ATT ATA Lysine Lys K AAAAAG Leucine Leu L TTG CTC CTG CTT CTA TTA Methionine Met M ATGAsparagine Asn N AAC AAT Asparagine Asn N AAC AAT Proline Pro P CCA CCTCCC CCG Glutamine Gln Q CAA CAG Arginine Arg R AGA AGG CGA CGC CGG CGTSerine Ser S AGT TCA TCC TCT AGC TCG Threonine Thr T ACA ACC ACT ACGValine Val V GTG GTT GTA GTC Tryptophan Trp W TGG Tyrosine Tyr Y TAC TAT

A second way to increase plant expression of microbial genes is toincrease the G+C content near the initiating methionine residue.Naturally-occurring coding sequences in plants tend to contain two orthree G and/or C residues immediately downstream of the ATG initiationcodon (Joshi et al. (1997) Plant Mol. Biol. 35:993-1001). Introducinginto the gat coding sequence one or two CG-rich codons immediatelydownstream of the ATG intitiation codon may create a more plant-likecoding sequence and thus may enhance its expression in plants.Substitution of the second codon (isoleucine, ATA) for an alanine codon(GCG) resulted in a Ile2Ala variant with reduced k_(cat) compared to theunmodified enzyme. On the other hand, insertion of an alanine codon(either GCG or GCT) between the codons for Met at codon position 1 andIle at codon position 2 resulted in a gat coding sequence encoding a GATenzyme containing an Ala residue inserted between the Met at position 1and the Ile at position 2. An exemplary GAT enzyme variant containingtwo alanines inserted between Met1 and Ile2 denoted, identified as22-15B4 M1MAA (to signify the insertion of two Ala residues immediatelyfollowing the Met at position 1) and having the protein sequence SEQ IDNO:948, exhibited a reduced k_(cat) compared to the unmodified enzyme22-15B4 (SEQ ID NO:789). An exemplary GAT enzyme containing one alanineinserted between Met1 and Ile2, denoted 22-15B4 M1MA (to signify theinsertion of an Ala residue immediately following the Met at position1), having the protein sequence SEQ ID NO:946, exhibited essentiallyunaltered kinetics compared to the unmodified enzyme 22-15B4.

A general strategy for improving GAT expression in plants was developed.Evolved gat coding sequences may be altered by replacing codons lessfrequently utilized in plants for codons more frequently utilized inplants, for example according to the table above. Codons less frequentlyutilized in plants (e.g., according to the above table) should generallybe avoided. In this manner at least one codon (such as, at least threecodons, at least five codons, or at least ten codons), may be changed inthe gat coding sequence from codon(s) less frequently utilized in plantsto codon(s) more frequently utilized in plants. The codons which arereplaced may be located at the 5′ end of the coding sequence (e.g.,within the first 10 codons, within the first 20 codons, within the first50 codons, or within the first 100 codons) of the gat coding sequence.Alternatively, the codons which are replaced may be located throughoutthe gat coding sequence. The more frequently utilized codons furthermoremay be chosen to avoid more than about 5-10 (such as, e.g., more thanabout 5, more than about 6, more than about 7, more than about 8, morethan about 9 or more than about 10) consecutive occurrences of G+C or ofA+T within the coding sequence. The coding sequence may also be alteredto contain one or two CG-rich codons immediately downstream of the ATGintitiation codon, such as, for example, by inserting an Ala codon(e.g., a frequently utilized Ala codon) immediately downstream of andadjacent to the initiating Met codon of the gat coding sequence.

Table 14 provides exemplary gat coding sequences altered as describedabove.

TABLE 14 original gat coding altered gal coding sequence sequence codonchanges made encoded protein GAT20-8H12/4604 GAT4604SR Ser 62 TOG -> TCT20-8H12 (SEQ ID NO: 738) (SEQ ID NO: 932) Arg 111 CGG -> AGG (SEQ ID NO:739) GAT22-18C5/4609 GAT4609SR Ser 62 TCG -> TCT 22-18C5 (SEQ ID NO:794) (SEQ ID NO: 933) Arg 111 CGG -> AGG (SEQ ID NO: 795)GAT22-16D8/4610 GAT4610R Arg 111 CGG -> AGG 22-16D8 (SEQ ID NO: 792)(SEQ ID NO: 934) (SEQ ID NO: 793) GAT22-15B4/4611 GAT4611R Arg 111 CGG-> AGG 22-15B4 (SEQ ID NO: 788) (SEQ ID NO: 935) (SEQ ID NO: 789)GAT24-5H5/4614 GAT4614SR Ser 62 TCG -> TCT 24-5H5 (SEQ ID NO: 848) (SEQID NO: 936) Arg 111 CGG -> AGG (SEQ ID NO: 849) GAT24-5H5/4614GAT4614VSR Val 4 GTG -> GTA 24-5H5 (SEQ ID NO: 848) (SEQ ID NO: 937) Ser62 TCG -> TCT (SEQ ID NO: 849) Arg 111 CGG -> AGG GAT23-2H11/4615GAT4615R Arg 111 CGG -> AGG 23-2H11 (SEQ ID NO: 836) (SEQ ID NO: 938)(SEQ ID NO: 837) GAT24-15C3/4616 GAT4616R Arg 111 CGG -> AGG 24-15C3(SEQ ID NO: 862) (SEQ ID NO: 939) (SEQ ID NO: 863) GAT23-6H10/4617GAT4617R Arg 111 CGG -> AGG 23-6H10 (SEQ ID NO: 844) (SEQ ID NO: 940)(SEQ ID NO: 845) GAT25-8H7/4618 GAT4618SR Ser 62 TCG -> TCT 25-8H7 (SEQID (SEQ ID NO:906) (SEQ ID NO: 941) Arg 111 CGG -> AGG NO: 907, 957)GAT25-8H7/4618 GAT4618VSR Val 4 GTG -> GTA 25-8H7 (SEQ ID NO: 906) (SEQID NO: 942) Ser 62 TCG -> TCT (SEQ ID NO: 907) Arg 111 CGG -> AGGGAT25-19C8/4619 GAT4619SR Ser 62 TCG -> TCT 25-19C8 (SEQ ID NO: 912)(SEQ ID NO: 943) Arg 111 CGG -> AGG (SEQ ID NO: 913) GAT25-19C8/4619GAT4619VSR Val 4 GTG -> GTA 25-19C8 (SEQ ID NO: 912) (SEQ ID NO: 944)Ser 62 TCG -> TCT (SEQ ID NO: 913) Arg 111 CGG -> AGG GAT22-15B4/4611GAT4611A Ala codon inserted between 22-15B4 M1MA (SEQ ID NO: 788) (SEQID NO: 945) Met1 and Ile2 (SEQ ID NO: 946) GAT22-15B4/4611 GAT4611AA 2Ala codons inserted 22-15B4 M1MAA (SEQ ID NO: 788) (SEQ ID NO: 947)between Met1 and Ile2 (SEQ ID NO: 948) GAT25-8H7/4618 GAT4618A Ala codoninserted between 25-8H7 M1MA (SEQ ID NO: 906) (SEQ ID NO: 949) Met1 andIle2 (SEQ ID NO: 950) GAT25-8H7/4618 GAT4620 Ala codon inserted between25-8H7 M1MA (SEQ ID NO: 906) (SEQ ID NO: 951) Met1 and Ile2 and first 7(SEQ ID NO: 950) codons = more frequently utilized codons; plus Ser 62TCG -> TCT Arg 111 CGG -> AGG GAT25-8H7/4618 GAT4621 Ala codon insertedbetween 25-8H7 M1MA (SEQ ID NO: 906) (SEQ ID NO: 952) Met1 and Ile2 andmore (SEQ ID NO: 950) frequently utilized codons throughout

A binary vector with a dMMV-gat-UBQ3 cassette in the T-DNA wastransformed into competent Agrobacterium tumefaciens strain C58 cells byelectroporation (McCormac et al., Mol. Biotechnol. 9:155-159, 1998).After growth on LB+40 ug/ml kanamycin plates for 2 days at 28° C.,colonies were inoculated into LB+40 ug/ml kanamycin liquid medium andshaken overnight at 28° C. The Agrobacterium cells were collected bycentrifugation at 4000 g for 10 minutes and then resuspended in a volumeof 10 mM MgSO4 equivalent to the initial culture volume. This bacterialsuspension was forced or “infiltrated” into the intercellular spaces ofNicotiana benthamiana leaves using a 1 ml plastic syringe (with noneedle). By infiltrating 200-300 μl of bacterial suspension into eachspot (typically 3-4 cm² in infiltrated area), 4 or more spots could bearranged on a single leaf still attached to the plant. In some cases thegat-containing Agrobacterium strain was diluted 5:1 or 10:1 with asecond Agrobacterium strain lacking gat prior to infiltration. Thisdilution step has the effect of reducing the overall expression of thegat gene in the plant cells, thereby preventing saturation and allowingeasier visualization of expression differences between variants andconstructs. After 3 days the leaf material was ground, extracted inaqueous buffer, and centrifuged. The supernatant, containing the solubleproteins, was subjected to SDS-PAGE, and the gel was blotted and probedwith an antiGAT polyclonal antibody.

The level of GAT protein accumulated in tobacco leaves infiltrated withthe GAT4620 gene was comparable to the level of protein accumulated inleaves transformed with the unmodified GAT25-8H7/4618 gene. Tobaccoleaves harboring the GAT4621 gene, on the other hand, exhibited abouttwo-fold greater GAT protein accumulation, as a percent of totalprotein, compared to leaves expressing the unmodified GAT25-8H7/4618gene.

Example 22 T1 Studies of Glyphosate-Resistant Soybean Expressing GATTransgenes

Soybean plants expressing GAT transgene 18-28D9c (SEQ ID NO:824) wereproduced using the methods described in Example 17. T1 seed wascollected from glyphosate sprayed T0 plants. T1 seed were germinatedunder greenhouse conditions in RediEarth® 360 medium, available fromScotts, Marysville, Ohio, and sprayed at V2-V3 stage with either 2× or4× Glyphosate (RoundUp ULTRA MAX™, available from Monsanto, St. Louise,Mo.) as per methods described in Example 17. Plants were scored after 10days and leaf discoloration scores taken as described in Example 17. TheT I greenhouse spray data correlated well with previous greenhouseresults at the T0 plant stage. T2 seed was collected for furtherstudies.

Example 23 Production of Glyphosate and Sulfonamide Resistant SoybeansExpressing GAT and HRA Transgenes

Soybean plants expressing GAT & HRA, high resistance allele ofacetolactate synthase (U.S. Pat. Nos. 5,605,011, 5,378,824, 5,141,870,and 5013,659), genes were produced using the methods described inExample 17. The HRA gene was used as selectable marker gene fortransformation. The selection agent was chlorsulfuron at a concentrationof 100 ng/ml. The selectable marker gene was comprised of theS-adenosyl-L-methionine synthetase (SAMS) promoter from Glycine max(U.S. 2003/226166), HRA coding sequence from Glycine max andacetolactate synthase terminator from Glycine max. The selectable markergene was either linked to or co-bombarded with a GAT constructconsisting of a synthetic constitutive promoter (U.S. Pat. Nos.6,072,050 and 6,555,673) or the maize Histone 2B promoter (U.S. Pat. No.6,177,611), a GAT variant (18-28D9c (SEQ ID NO:824)) and the Pin IIterminator (Gyheung an et al., Plant Cell 1:115:122 (1989)). Transgenicplants were generated as described in Example 17. Levels of glyphosateresistance were determined as described in Example 17 using plantdiscoloration scores after 2× or 4× glyphosate application rates. Theresults shown in Table 15 demonstrate that different constitutivepromoters driving GAT variant (18-28D9c (SEQ ID NO:824)) conferglyphosate resistance in T0 plants.

TABLE 15 Resistance Scores at 10 days after treatment with 4Xglyphosate. # EVENTS # EVENTS % EVENTS @ 3-6 TESTED WITH 4X @ 7-8 SCORESCORE PHP20163a 58 15.5% (9) 77.6% (45) SEQ ID NO: 824 PHP20558a 2634.6% (9) 42.3% (11) (SEQ ID NO: 824) (H2B PROMOTER)

Example 24 T1 Pre-Emergence Studies of Soybeans Expressing GAT and HRATransgenes

T1 seed generated from experiments as described in Example 17, wereplanted in pots of Tama Silt loam in the greenhouse. Pots wereimmediately sprayed with a pre-emergence application of chlorimuron,rimsulfuron or tribenuron at a rate of 70 gms a.i./hectare. Germinatingplants were evaluated 10 days post spray application based on plantdiscoloration scores described in Example 17. All HRA and GAT eventssurvived all pre-emergence spray applications with a rating of 9(uninjured). These results demonstrate pre-emergence resistance tosulfonamide chemistry in soybeans.

Example 25 T1 Post-Emergence Studies of Soybeans Expressing GAT and HRATransgenes

T1 seed generated from experiments as described in Example 17 weregerminated in RediEarth® 360 medium in the greenhouse. Plants weresprayed at the V2-V3 stage (14 days after potting) with thifensulfuron,chlorimuron, rimsulfuron or tribenuron (70, 70, 35, 35 gm a.i./hectare,respectively). Plants were evaluated 10 days post application based onplant discoloration scores described in Example 17. Results are shown inTable 16.

TABLE 16 Resistance Scores at 10 days after Post-Emergence Treatmentwith Sulfonamide Chemistry. Average Resistance Scores from GAT (SEQ IDNO: 824)/SAMS Promoter-HRA Events Unspayed Control 9 Chlorimuron (70 gma.i./ha) 7.75 Rimsulfuron (35 gm a.i./ha) 2.21 Tribenuron (35 gma.i./ha) 3.83 Thifensulfuron (70 gm a.i./ha) 7.81

Events having a plant discoloration rating 7 or 8 after thifensulfuronspray were sprayed with either a 2× or 4× application of glyphosateafter 10 days as per methods described in Example 17. Events wereevaluated based on discoloration scores described in Example 17. Allthifensulfuron tolerant events survived the glyphosate spray with scoreof 7 or 8 (results not shown). These results demonstrate 100%correlation of thifensulfuron tolerance with glyphosate tolerance undergreenhouse conditions conferred by HRA and GAT genes, respectively, at70 gm a.i./hectare thifensulfuron and 2× glyphosate, respectively.

Example 26 T3 Studies of Glyphosate-Resistant Maize Plants ExpressingGAT Transgenes

Maize plants expressing GAT transgenes 20-H812 (SEQ ID NO:738) and20-16A3 (SEQ ID NO:638) were produced using the methods described inExample 13. Plants were scored after 10 days and leaf discolorationscores taken as described in Example 13. Specifically, plants weresprayed at V4 leaf stage. The plants were thinned to equal spacing andstand counts after application of spray treatments. Commerciallyavailable NK603 (Monsanto, St. Louis, Mo.) was used as a control.Resistance scores are shown in Table 18. Plant height measurements werealso taken 10 days after treatment and are shown in Table 18.

TABLE 18 Resistance Scores at 10 days after treatment with glyphosateResistance Scores (1-9 scale) No Glyphosate 1X (26 oz/ 4X (104 oz/ #Events Treatment A) A) 8X (208 oz/A) CONSTRUCTS Tested Control UltraMaxUltraMax UltraMax 19900 2 9 8.8 7.9 5.4 (SEQ ID NO: 738) 19902 4 9 8.48.0 5.9 (SEQ ID NO: 638) NK603 1 9 8.4 7.9 5.5

TABLE 19 Plant Height (in inches) 10 days after treatment withglyphosate No Glyphosate 4X (104 oz/ 8X (208 oz/ # Events Treatment 1X(26 oz/A) A) A) CONSTRUCTS Tested Control UltraMax UltraMax UltraMax19900 2 18.5 16.6 16.4 16.2 (SEQ ID NO: 738) 19902 4 19.5 16.7 17.0 16.2(SEQ ID NO: 638) NK603 1 20.1 17.6 17.5 17.3

Example 27 T3 Yield Studies of Glyphosate-Resistant Maize Expressing GATTransgenes

T3 seed from Example 15 was used to generate T3 plants for thegeneration of glyphosate field tolerance data on hybrids. The experimentwas conducted at Viluco, Chile with four (4) replications using asplit-plot design. Specifically, 3 entries were included. Two of theentries comprised maize plants expressing GAT variant transgenes 17-15H3(SEQ ID NO:549). A glyphosate-resistant control NK603, which iscommercially available from Monsanto, was the third entry. All entrieswere treated in the field with four different glyphosate spraytreatments (OX, 4× at V4, 8× at V4, and 4× at V4 and 4× at V8) for eachevent. Plants were scored 10 days after treatment for plant heightcomparisons as described in Example 13. The T3 field spray datacorrelated well with the results previously obtained in the field asreported in Example 15. Specifically, all entries sprayed with 1× and 4×glyphosate were similar in height to unsprayed controls. At the higher4× at V4 and 4× at V8 rates, the GAT entries were temporarily set backbetween 12 and 17% in height and the NK603 entry was set back 6%;however, later in the season (during reproductive maturity) the heightof glyphosate-treated entries was the same as in the unsprayed entries.Moreover, yields among glyphosate-treated entries were neithernumerically nor statistically reduced from unsprayed entries(LSDO_(0.05)=11.8 bu./acre, average yield per entry=243 bu./acre).Similar results were observed in preliminary agronomic trials with T2plants of the same events that were planted in Johnston, Iowa and York,Nebr. (data not shown).

Example 28 T2 Studies of Glyphosate-Resistant Maize Expressing GATTransgenes

Experiments were conducted on GAT positive and GAT negative iso-lines.Maize plants expressing GAT transgenes 18-28D9b (SEQ ID NO:814), 17-15H3(SEQ ID NO:549), 20-8H12 (SEQ ID NO:738), 20-16A3 (SEQ ID NO:638), wereproduced using the methods described in Example 17. T2 plants wereexamined. GAT positive T2 plants were sprayed at V4 with 1× (26 oz/AULTRA MAX™). GAT negative plants were PCR sampled at V4. GAT positiveplants were removed from the row. No glyphosate was applied to the GATnegative plants. Plants were thinned to create equal spacing amongplants within each row. Four (4) replications were performed. Grain fromfive (5) ears harvested from the middle of each row was dried andweighed. As shown in Table 20 no yield reduction was detected for any ofthe constructs.

TABLE 20 Yield data. Yield GAT positive sprayed with YIELD 1X (26 oz/A)GAT negative ULTRA no glyphosate Construct # of events MAX ™ at V4applied PHP19286 40 1.65 lbs/5 ears 1.57 lbs/5 ears (SEQ ID NO: 814)PHP19288 40 1.64 lbs/5 ears 1.60 lbs/5 ears (SEQ ID NO: 549) PHP19900 61.20 lbs/5 ears 1.23 lbs/5 ears (SEQ ID NO: 738) PHP19902 4 1.19 lbs/5ears 1.21 lbs/5 ears (SEQ ID NO: 638)

Example 29 Amino Acid Substrates of GAT Polypeptides

GAT activity of several GAT polypeptides of the present invention wasevaluated with respect to a number of amino acid substrates. The GATpolypeptide, AcCoA and amino substrate were incubated in 25 mM Hepes, pH6.8, 10% ethylene glycol in the wells of a 96-well polystyrene plate.After 30 minutes, the reactions were stopped by the addition of 30 μl of10 mM 5,5′-dithiobis-2-nitrobenzoate (DTNB) in 500 mM Tris, pH 7.5.After 2 minutes, absorbance was read at 412 nm in a Spectramax Plusplate reader (Molecular Devices, Sunnyvale, Calif.).

In addition to glyphosate, native GAT polypeptide 401 (SEQ ID NO: 6) (orB6 (SEQ ID NO: 7), in the case of phosphoserine) exhibited detectableactivity with 12 amino acids. The native GAT polypeptide was about asactive with L-aspartate, about 4.7 times more active with L-serine, andabout 2 times more active with phospho-L-serine than with glyphosate.When compared to native GAT polypeptide, non-native GAT polypeptides17-15H3 (SEQ ID NO: 601) and 25-8H7 (SEQ ID NO: 907) exhibited a 40-foldincrease in activity with aspartate, but loss of activity with respectto serine and phosphoserine.

In addition to aspartate and serine, activity with native GATpolypeptide at 3% or more of that toward glyphosate when present at 1 mMwas observed with the following L-amino acids: histidine (10%), tyrosine(18%), threonine (250%), valine (12%), glutamate (51%), asparagine(27%), glutamine (32%), alanine (33%), glycine (21%) and cysteine (50%).Activity with the other protein amino acids was either undetected orless than 3% that of GAT activity towards glyphosate as the substrate.No detectable activity was observed with respect to the native GATpolypeptide on the N-methyl derivatives of L-aspartate (2 mM), L-alanine(10 mM) and glycine (i.e., sarcosine, 10 mM). The percentages refer topercent activity relative to activity of the GAT polypeptide towards thesubstrate, glyphosate. Some of the data is shown below in Table 21.

TABLE 21 GAT activity with Respect to Glyphosate, Aspartate and SerineGAT K_(M) ± SE k_(cat)/K_(M) variant Substrate k_(cat) ± SE, min⁻¹ MMmin⁻¹ mM⁻¹ k_(cat)/K_(M) 401 Glyph  5.35 ± 0.043  1.27 ± 0.0144 4.21Fold imp. 17-15H3 1150 ± 27.6  0.251 ± 0.0041 4573 1086 25-8H7 1480 ±65.4  0.05 29600 7031 % of glyph 401 Aspartate 24.1 6.7  3.6 85.517-15H3 435 ± 11.3 2.95 ± 0.162 148 3.24 25-8H7 702 ± 12.4 4.56 ± 0.112154 0.520 401 Serine 854   43    19.8 471 17-15H3 242 ± 15.8 60.1 ±1.68  4.04 0.0882 25-8H7 388 ± 18.5 154 ± 10.7  2.53 0.00855

Example 30 Effect of pH on GAT Activity

The pH optima of k_(cat) and K_(M) for wild-type enzyme B6 (SEQ ID NO:7) and GAT polypeptide 17-15H3 (SEQ ID NO: 601) were determined usingthe spectrophotometric assay described in Example 7 except that assaybuffer was 50 mM Hepes and 10% ethylene glycol, titrated to a range ofpH values. Protein concentrations were determined by the UV absorbanceassay described in Example 19. The effect of pH on K_(M) and K_(cat) isshown in FIG. 18 for clones B6 (SEQ ID NO: 7) and 17-15H3 (SEQ ID NO:601).

While the foregoing invention has been described in some detail forpurposes of clarity and understanding, it will be clear to one skilledin the art from a reading of this disclosure that various changes inform and detail can be made without departing from the true scope of theinvention. For example, all the techniques, methods, compositions,apparatus and systems described above may be used in variouscombinations. The invention is intended to include all methods andreagents described herein, as well as all polynucleotides, polypeptides,cells, organisms, plants, crops, etc., that are the products of thesenovel methods and reagents.

All publications, patents, patent applications, or other documents citedin this application are incorporated by reference in their entirety forall purposes to the same extent as if each individual publication,patent, patent application, or other document were individuallyindicated to be incorporated by reference for all purposes.

1. A transgenic plant comprising: (a) a first nucleotide sequenceencoding a first polypeptide which has glyphosate-N-acetyltransferaseactivity; and (b) a second nucleotide sequence encoding a highresistance allele of acetolactate synthase, wherein said plant exhibitstolerance to glyphosate as compared to a plant of the same species,strain or cultivar that does not comprise said first polynucleotide. 2.The plant of claim 1, wherein said plant does not exhibit a reduction inyield following treatment with glyphosate applied at a level effectiveto inhibit the growth of a plant of the same species, strain, orcultivar that does not comprise said first polypeptide.
 3. The plant ofclaim 1, wherein said plant is a crop plant selected from the group ofgenera consisting of: Eleusine, Lollium, Bambusa, Brassica, Dactylis,Sorghum Pennisetum, Zea, Oryza, Triticum, Secale, Avena, Hordeum,Saccharum, Coix, Glycine and Gossypium.
 4. A seed produced by the plantof claim 1, wherein said seed comprises the first and the secondpolynucleotide.
 5. A method for controlling weeds in a field containinga crop comprising: (a) planting the field with crop seeds or plantswhich comprise: (i) a first nucleotide sequence encoding a firstpolypeptide which has glyphosate-N-acetyltransferase activity; and (ii)a second nucleotide sequence encoding a high resistance allele ofacetolactate synthase; and (b) applying to any crop and weeds in thefield a sufficient amount of herbicide to control weeds withoutsignificantly affecting the crop.
 6. The method of claim 5, wherein saidherbicide comprises glyphosate.
 7. The method of claim 5, wherein saidherbicide comprises a sulfonylurea herbicide.
 8. The method of claim 7,wherein said sulfonylurea herbicide comprises at least one of thefollowing: chlorimuron, chlorsulfuron, rimsulfuron, thifensulfuron, ortribenuron.
 9. The method of claim 8, wherein said herbicide comprisesglyphosate and a sulfonylurea herbicide.
 10. The method of claim 9,wherein said sulfonylurea herbicide is selected from the groupconsisting of: a) chlorimuron; b) chlorsulfuron; c) rimsulfuron; d)thifensulfuron; and e) tribenuron.